ThePhilips - Slashdot User

Comment Generic Programming (Score 2) 42

by ThePhilips on Monday January 19, 2015 @03:39PM (#48851481) Attached to: Interviews: Alexander Stepanov and Daniel E. Rose Answer Your Questions

I think generic programming is destined to be second class citizen. IMO the human side of the problem is not the biggest out.

The biggest one is that there is no compiler which can untangle such code and generate something efficient out of it.

This is basically the attic where all the "everything is an object" languages have stuck. On one side, development is made easier in many places because everything is an object. On the other side, the performance and memory consumption degrade, to the point where developers end up counting and optimizing object usage of every code line and function. Because there are no compilers which are capable of deducing from the human readable code what the hell developer actually wanted to accomplish.

That brings me to the other bigger issue. Most concepts and paradigm, including the generic programming, which occupy the minds of researchers do NOT help solve the ultimate problem of the computer programming: efficiently communicating with the CPU, efficiently telling it what needs to be done.

If developer is a writer, CPU is a reader, and assembler is the spoken language, then most simple programs, with 10-50K instructions, are close the novel size. Think of it: the usual "Hello World" program, to a CPU is close in size to the novel! And if it's in an interpreted language, the CPU might end up reading a whole frigging roman, just to deduce that all what developer wanted was to print the "Hello World" on screen.

Comment Re:utf-32/ucs-4 (Score 1) 165

by ThePhilips on Monday January 12, 2015 @04:58AM (#48791489) Attached to: NetHack Development Team Polls Community For Advice On Unicode

Everybody has already settled on the little-endian presentation.

What makes you think this? There are plenty of old Motorola architecture based systems still in legacy environment use, preserved for stable scientific or business computing environments.

Man, I come from the BE world. You do not need to tell me that there is still abundance of the BE hardware.

And there is a significant amount of new, bi-endian hardware being produced now,

Most modern CPUs I had to deal with, except the Intel, are bi-endian. BUT. Most (by model number) are used in BE mode. (But since ARM also has settled on the LE, now it is effectively a LE world.)

Yet.

1st. The endianness of the CPU is not related to the endianness of an data exchange format.

2ns. The endianness of the data exchange format does not relate to the internal presentation of the data in the application's memory.

I'm afraid I have quite a lot of experience with Unicode compatibility and cross compatibility. Frankly, for a multi-platform tool like Nethack, I'd stay with the 8-bit, one byte, extremely stable 'POSIX' standard.

You folks lump it all together. There are two sides to it: internal presentation and external conversion.

For internal presentation, one goes with whatever makes your life as developer easier. UCS-4 is definitively an option. UTF-8 (aka "I do not care, just passing data through") is also OK. Most applications fall into the later category. But if one ever starts pondering use of the widechars, when one needs to actually peek at the data, then there is simply no point using the UTF-16. And UTF-8 has disadvantages whne .

For external conversions, all what matters that the internal format can be easily converted into the widely used encodings. Application doesn't have any direct control over it - it is user controlled. User might pick UTF-8. Or JIS. Or win-1257. And application has to make sure that when it spews the data to outside, they come out in the encoding requested by the user.

Naive notion of that utf-8 is used by everybody is extremely naive. And IMO it is rooted in the same arrogance which held back the *nix world for decades in the dark ages of the 7-bit ASCII.

Comment Re:utf-32/ucs-4 (Score 1) 165

by ThePhilips on Sunday January 11, 2015 @04:41PM (#48788849) Attached to: NetHack Development Team Polls Community For Advice On Unicode

So what you propose?

Go with utf-8 which doesn't alleviate any of the problems? But adds its own one?

Beside, I doubt very much that anybody is going to use any of the fancy characters in the NetHack.

Comment Re:utf-32/ucs-4 (Score 1) 165

by ThePhilips on Sunday January 11, 2015 @04:33PM (#48788813) Attached to: NetHack Development Team Polls Community For Advice On Unicode

Characters in Thai are rendered in display-oredr, and not logical order. [...]

Ha! Not relevant to me, actually. But very informative. Thanks.

Overall, most customers are aware of the problems (and in my experience better than me). Simple handling I had in my software had worked and was sufficient.

The Thai language specifically is a cool example. Why not relevant? My company refused to do Thai localization. (And thanks to you now I know fully why.) To do the localization we were told that we have to buy a special Thai language library. The library costs huge money. When we told customer that they would have to pay for it, they have refused and canceled the project, because for them it was too too expensive.

Comment Re:utf-32/ucs-4 (Score 0) 165

by ThePhilips on Sunday January 11, 2015 @04:04PM (#48788647) Attached to: NetHack Development Team Polls Community For Advice On Unicode

So basically what you (and others) are saying, is that since there are some edge cases foreseen in the standard, nobody should try to make life easier even by a bit?

Combining characters (and the rest of the crap) pretty much never occur in real life. Only in some sadistic test case for the Unicode libraries, probably.

The main purpose of Unicode, why both users and developers want it, is to represent as much as characters as possible with least hustle possible. And that's pretty much what everybody's shooting for.

Comment Re:utf-32/ucs-4 (Score 1) 165

by ThePhilips on Sunday January 11, 2015 @02:47PM (#48788125) Attached to: NetHack Development Team Polls Community For Advice On Unicode

Its obvious you have little real experience with unicode, because saying 'just convert to utf-32' just papers over the problems without solving them.

Indeed I've only scratched surface. And that alone gave me headaches for months.

UTF-32 units are code points, not characters, and there are many multi-code-point (variable length) characters in utf-32.

For example?

Comment Re:utf-32/ucs-4 (Score 1) 165

by ThePhilips on Sunday January 11, 2015 @02:41PM (#48788097) Attached to: NetHack Development Team Polls Community For Advice On Unicode

It is the same problem as with the fancy acute/agrave/etc special symbols.

And the special white-space/no-space characters. And the special writing direction change characters.

They are generally removed during normalization/conversion into canonical presentation.

The thing is, after the normalization, which is needed for any Unicode text anyway, UCS-4 becomes a plain array of characters. But UTF-8 - still not.

Comment Re:utf-32/ucs-4 (Score 2) 165

by ThePhilips on Sunday January 11, 2015 @01:24PM (#48787563) Attached to: NetHack Development Team Polls Community For Advice On Unicode

i don't see a real argument here. "considering the length". how long is it?

Check the game history. Literally decades between major releases.

"some of the silliness". what silliness is this exactly? external storage of utf-32 requires that one deal with an endian character set. every time any text is touched, you'll get to endian convert.

Everybody has already settled on the little-endian presentation.

isn't that awesome? utf-8 does not have this issue. and one can almost always treat utf8 as a byte stream. except in the rare case where one needs to know where character boundaries are. for example, to map the character to a font. the fast path is the common path (ascii), and just requires a single test ((c&0x80) == 0).

With UCS-4 you do not even need any tests.

Extracting a character - trivial.

Length of string - trivial.

Normalization - much simpler than the utf-8.

The sad reality that libraries I have seen actually implement the utf-8 handling by using internally utf-32. You can't avoid it: Unicode is specified in the code points, which as you point it out are already as good as 32 bit long.

sure the gnu c library has had bad wchar_t conversion routines in the past, but it's a free country. you can implement your own.

Frankly, I haven't even used C library for the purpose. We had already one library developed in-house, because portable support for utf-8 is patchy at best.

The sanest portable approach is to link with iconv and convert everything from some internal presentation to external. Because you can never know what encoding user needs. Unless you really need to save the RAM (one has shitload of string data), utf-8 simply sucks as internal presentation.

P.S. I have had very little experience with Unicode. But several month of dealing with it, have simply convinced me that if one has to deal with l10n/i10n, then utf-16/utf-32 are very good choices. Ditto, if one has to deal with the Unicode. If application really doesn't care what it prints or reads - then pass-through binary (utf-8) works too. But as soon as one has to take the length of utf-8 string (real length), then it is time to start switching from utf-8 to utf-32.

Comment utf-32/ucs-4 (Score 1) 165

by ThePhilips on Sunday January 11, 2015 @12:30PM (#48787231) Attached to: NetHack Development Team Polls Community For Advice On Unicode

Considering the length of their release cycle, seems to be a safe choice.

It's not like the difference 1/2/4 bytes would make much performance difference for the application like NetHack.

Using the utf-32 internally would save them from some of the silliness the alternatives like utf-8 bring with them.

Comment Re:Why not free education for life? (Score 1) 703

by ThePhilips on Friday January 09, 2015 @04:38AM (#48772973) Attached to: Obama Proposes 2 Years of Free Community College

Gee, money does grow on trees.

Since some money are actually made of actual paper, this is a factually correct statement!

Tax payers can afford to pay for lazy asses to never enter the work force by continuing education for a life time.

Two years is a life time?

Let me guess: you are from red neck state with life expectancy under 35?

And while we are at it, why not free cable?

One can succeed in life without the cable. But not without the education.

Or perhaps free condoms?

Actually some school and medical institutions already give away free condoms.

Comment This is nuts (Score 1) 325

by ThePhilips on Thursday January 08, 2015 @04:50PM (#48768401) Attached to: Ask Slashdot: High-Performance Laptop That Doesn't Overheat?

And it's only 3.6GHz!

I LOLed.

It would helped if you have specified the workloads you are after.

Otherwise, laptops with best termal control I have seen were HP and Apple. But those, again, are laptops, not a slim portable servers. Laptops will always suck at this, because they are, duh, laptops: they are designed to be portable, not being capable of dissipating >300W of waste heat. They are also designed not to burn wholes in your pants, if you per chance would decide to put the laptop on your, well, lap.

I have personally in the past used a plain PC tower as a compile farm for software development. The laptop was old (very old) and compiling anything large-ish on it was a huge chore (and waste of time). I have configured the distcc to simply run all compilation of the PC instead. Work and compile in the quiet of the bedroom, while noisy and hot PC, compiling stuff full time on its four cores, stays in the guest room.

Comment Re:Micromanagement reigns... (Score 1) 420

by ThePhilips on Wednesday December 31, 2014 @09:41AM (#48704049) Attached to: The Open Office Is Destroying the Workplace

Bummer, eh?

For as long as one is OK giving up some of the credit to the ass kissers, it can be made work.

The real problem are extroverts who are not yet promoted, or worse: those who believe that they are great coders. (As you can see, they are very humble: they are not "extraordinary" - just "great".)

For as long as your lower ranks are made up of reliable people, the introverts can steer the company any way they want, because they are the people who do the actual work.

And extroverts have rather short attention span.

When I say, "A is not possible, but B is", they basically have no chance of opposing me. If they insist on going with A, I can still do B and later say that it is what they have asked. Chances are that they are already on the next big thing and do not care about the "past".

Comment Re:Shut it down (Score 5, Insightful) 219

by ThePhilips on Sunday December 28, 2014 @11:15AM (#48684331) Attached to: 5,200 Days Aboard ISS, and the Surprising Reason the Mission Is Still Worthwhile

Eliminating space program would mean that lots of R&D wouldn't be done anymore (domestically).

That's a dangerous gamble.

USA should trim the military budget first.

Comment Re:So, what comes next? (Score 1) 138

by ThePhilips on Wednesday December 24, 2014 @10:29AM (#48666511) Attached to: Many DDR3 Modules Vulnerable To Bit Rot By a Simple Program

Wear Leveling?

Leakage Leveling?

P.S. Question is whether a workaround is possible with the CPU microcode.

Comment Can it be enabled? (Score 2) 73

by ThePhilips on Tuesday December 23, 2014 @07:54PM (#48664077) Attached to: Docker Image Insecurity

Docker's report that a downloaded image is 'verified' is based solely on the presence of a signed manifest, and Docker never verifies the image checksum from the manifest.

Can it be enabled? If yes, then I do not see a problem.

Otherwise, the signing crap is just that: crap.

It takes needlessly long time to verify the signature. (Because they are not slow! - they are so secure, so very much OMG secure.)

It is a huge risk to reconfigure a production system to use unsigned data if emergency arises. (Think recovery from a local backup.)

Developers forget to renew their certificates and suddenly, in the middle of a production, whole system goes down, because OMG the certificate has expired and data may be not secure!!!

And then, in the end, the signing keys get leaked or stolen...

Slashdot Top Deals