Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×

Comment Re: Lol (Score 1) 248

What I recommend is that anything that takes text input assume that the input could be any possible arrangement of the data units (ie any stream of bytes for UTF-8, and any stream of 16-bit words for UTF-16).

Don't "sanitize", because that is simply a step that produces a new string and feeds it to the next step. You have not fixed anything because an error in "sanitizing" will still crash (as quite a few posters here have tried to point out). The work must be done at the point that the data is translated to something other than a string. In this case is is the glyph layout in their rendering. That code should assume the input is ANY possible arrangement. Ideally it should draw something visible showing that there was an error and place it between glyphs so that it is clear what location in the string the error was.

Relying on previous steps only producing valid data is not only unsafe (as this bug shows) but also wasteful because of the scanning of the data. And it is either lossy (because errors are translated to a valid sequence and thus two different inputs map to the same result) or a denial of service (due to an exception being thrown and the loss of further processing). Unfortunately handling that is completely obvious for most data is somehow confusing to programmers when they are presented with Unicode.

Comment Re: Lol (Score 1) 248

From that description it does sound like the string is still valid. However if the display is crashing on a certain sequence containing an ellipsis, I am not clear why you can't construct that string directly, rather than rely on the insertion of the ellipsis.

It does sound like they maybe rely on "sanitizing" but of a far more complex scheme that I was aware of. This is still wrong, maybe far worse, as they are detecting and rejecting patterns containing ellipsis and some other character, that is INSANE!!!. Any such work should be delayed until the VERY LAST moment. In this case their glyph layout should simply not crash on any possible arrangement of bytes or words in the incoming string. This is very much the same stupidity that I was ranting about for UTF-8. Nobody used to crash because you put mis-spelled words in your text and tried to print it. Apply the same logic to UTF-8 and Unicode. It is not hard and it seems really obvious, but for some reason Unicode turns some otherwise really smart programmers into total idiots.

Comment Re:Android IS a huge financial success. . . (Score 1) 344

From their perspective it'd be much worse than higher search rev shares. If Android did not exist, Google Maps would have been wiped out overnight on mobile when Apple decided to go it alone (against the wishes of their own userbase, no less). Android was never about making direct profit, it was always about ensuring Google was able to deliver their services directly to users. They were quite open about this from the start. And judged by this standard it has been an incredible, epic success.

iOS is on the way down anyway. Outside of English speaking countries and Japan it's in the minority everywhere. In some countries, especially European countries like Germany and Spain, the iPhone has been crushed.

Comment Re:I'd prefer they stay armed, TYVM (Score 2) 69

I think you might take a look at Afghanistan and what it helped do to the soviets

The Soviet involvement in Afghanistan was more along the lines of:

Boris: "We need something to distract the people at home! They are getting restless!"

Piotr: "How about a war? That's always worked in the past!"

Boris: "Yes, but against who? We have to pick something close enough to be threatening, but far enough away that they won't come here!"

Piotr: "How about Afghanistan?"

Boris: "Perfect! Whoever heard of a Moslem holding a grudge?"

Comment Re:I am amazed (Score 1) 248

Actually I think "Unicode strings" should be avoided completely.

They do not help at all in doing text manipulation, because Unicode code points are *not* "characters" or any other unit that users think about. This is due to combining characters and invisible characters such as bidi indicators. There is a prefix code unit that eats the next 2 letters and turns it into a country flag! It is a huge mess.

Far more important is they all lack the ability to store errors that are in a UTF-8 string in a lossless way. This means you cannot trust arbitrary 8-bit data to survive translation to "Unicode" and back. This has been the source of endless bugs and is the reason people can't use Python 3.0.

Comment Re:I am amazed (Score 1) 248

My recommendation is special interators on std::string. Something like this:


    for (utf16_interator i = string.begin(); i != string.end(); ++i) {
          int x = *i;
          if (x < 0) error_byte_found();
          else utf16_found(x);
    }

There would also be interators for UTF-32 (probably what you were thinking of as "Unicode" but a lot of Microsoft programmers think "Unicode" means UTF-16). And iterators for other normalization forms. In all cases these would return negative numbers or some value that cannot be confused with a code point for UTF-8 error bytes.

This would be very fast because you can find the next Unicode code unit or whatever in constant time. Any api where you can arbitrarily index a unit using an integer is not going to be constant, it will be linear with that integer. Iterators avoid this.

Comment Re:Lol (Score 1) 248

No you don't. You are demonstrating the typical moronic attempts to deal with UTF-8.

Here is how you do it:

Go X bytes into the string. If that byte is a continuation byte, back up. Back up a maximum of 3 times. This will find a truncation point that will not introduce more errors into the string than are already there.

BUT BUT BUT I'm sure you are sputtering about how this won't give you exactly X "characters". NOBODY F**KING CARES!!!! If you want the string to "fit" you should be *measuring* it, not saying stuff that has not been true on computers since the 1950's about "N characters fit". I bet you think a combining letter and accent should count as 2, huh?

And your display function should not crash because it was given a string with an error in it! Even if you stupidly inserted the ellipsis all it should do is draw a few error indicators before the ellipsis.

Comment Re: Lol (Score 2) 248

No, the problem is code that pretends that illegal UTF-8 sequences magically don't exist!

For some reason UTF-8 turns otherwise intelligent programmers into complete morons. Here is another example from Apple. Let me state some rules about how to deal with UTF-8:

1. Stop thinking about "characters"!!!! This is a byte stream. The ONLY reason to think about a "character" is because you are DRAWING it on a display designed for a human to read, and humans do think about "characters". All other software either does not care, or is concerned with far more complex patterns (such as regexp and editors that deal with words and sentences), these second ones are not helped at all by an intermediate translation.

2. It is TRIVIAL to detect that the byte sequence you are looking at is not a valid UTF-8 character. In this case draw a replacement for exactly ONE byte and then try the next byte to see if it is a valid sequence. Do not skip more. There must be one error per byte so that the maximum number of good characters is preserved and so that a sequence with errors can be parsed bidirectionally without looking more than a few bytes ahead, and so that it is possible to search for error patterns. It also means there are only 128 different errors, not millions.

3. NEVER "translate to Unicode" (ie UTF-16) because this will be a lossy conversion of these invalid sequences and thus you have not preserved the original data. I'm sorry but Microsoft really screwed us here. Best recommendation is to write a wrapper around the filesystem calls and translate from UTF-8 to UTF-16 at the last moment, using U+DCxx as a translation for the error bytes (this is lossy but filenames already are, due to case independence, Apple's normalization, and even on Unix where "./foo" and "foo" are the same file).

This is blatantly obvious if you substitute "words" for "characters" and imagine how you would write a program to deal with text strings. Words are also composed of multiple bytes in a row. For some reason nobody seems to crash on misspelled words, and they manage to concatenate and split strings and make whole file systems and diff programs and all kinds of other fancy text manipulation without having to translate the text so that each word is a fixed-sized integer. Amazing!

Comment Re:Russian rocket motors (Score 1) 62

Russia would like for us to continue gifting them with cash for 40-year-old missle motors, it's our own government that doesn't want them any longer. For good reason. That did not cause SpaceX to enter the competitive process, they want the U.S. military as a customer. But it probably did make it go faster.

Also, ULA is flying 1960 technology, stuff that Mercury astronauts used, and only recently came up with concept drawings for something new due to competitive pressure from SpaceX. So, I am sure that folks within the Air Force wished for a better vendor but had no choice.

Comment Re:What a guy (Score 3, Interesting) 389

These career govt employees feed info to the pres, make recommendations, and fight for their interests. Even if a new pres wants to turn on a dime, Washington DC is a large ship that turns slowly.

Bingo. The old UK comedy "Yes Prime Minister" was a rather cutting illustration of this phenomenon at work.

What happens to someone when they become the prez? Enormous numbers of apparently experienced people begin telling you all kinds of secret things. They stress the importance of secrecy. They tell you about this plot or that plot. They say it's vital they get new powers and they not-so-subtly imply that if you don't help them Women And Children will DIE! And although it's left unstated you know perfectly well that if you don't give them what they want, you will see leaks in the press from anonymous officials that paint you as a prevaricator, as weak, as unconcerned for the lives of Patriotic Heroes And Their Women And Children.

The problem any US President has, and I daresay many other countries presidents, is that they are immediately submerged into a fantasy world woven from the agendas of the people around them mixed with their own pre-existing views, and those people are themselves also in a slightly less extreme form of a personal fantasy world and so on all the way down. A toxic brew of patriotism, belief in American exceptionalism, militarism and most of all pervasive classification means that it's impossible for a prez to penetrate the fog of misinformation that surrounds them. They can be manipulated into believing nearly anything because it would take an incredibly strong willed personality to say directly to the senior bureaucrats feeding them classified intelligence, "I think you are bullshitting me and I am going to personally audit your shit and prosecute you if you're lying to me".

Obama is very much NOT a strong willed personality. He sees himself primarily as a reasonable man who finds compromise between different factions. This makes him easily manipulated: all it takes is for people who agree to present him two apparently opposed positions - one extreme and one very extreme - and Obama will reliably pick something that is quite extreme. And the officials around him know that.

In hindsight it should have been obvious. Obama has no real track record of achievement in politics. He supported no particularly controversial positions, or showed any particularly clear thinking. Compared to Bush he seemed like a genius of course but Bush was a fucking man child, so that wasn't hard.

For that reason, Rand Paul fans might be disappointed if he won. I don't expect he would be able to accomplish as much change as people would like.

Almost certainly not. But it looks like Rand Paul is made of stronger stuff than Obama. Paul consistently argues for positions that piss off most of his party. He seems able to come to conclusions about things himself regardless of what other people believe. He seems to have fairly strong principles. He doesn't come across as the sort of wishy-washy people person that Obama is. If there's any US politician that actually might tell the people in his secret briefings "stop bullshitting me or I fire you", it's probably Rand Paul.

Comment Re:None. Go meta. (Score 3, Insightful) 336

That sort of logic holds true when moving between languages that are very similar. The transition between Python and Ruby or Java and C# spring to mind.

However if I need a C++ programmer and need one pronto, I'm not gonna hire a guy who has only JavaScript on his CV no matter what. Learning C++ is not merely learning a different way to create an array or slightly different syntax. To be effective in C++ you need to know how to do manual memory management and do it reliably, which takes not only domain knowledge but more importantly: practice and experience. You need to understand what inlining is. You very likely need to understand multi-threading and do it reliably, which takes practice and experience a pure JS guy is unlikely to have. You need to be comfortable with native toolchains and build systems: when the rtld craps its pants and prints a screenful of mangled symbols you need to be able to understand that you have an ABI mismatch, what that means and how to deal with it. Unfortunately that is mostly a matter of practice and experience. You might need to understand direct manipulation of binary data. There's just a ton of stuff beyond the minor details of the language.

Could the pure JS guy learn all this stuff? Of course! Will they do it quickly? No.

Slashdot Top Deals

On the eighth day, God created FORTRAN.

Working...