Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:I am amazed (Score 1) 248

Actually I think "Unicode strings" should be avoided completely.

They do not help at all in doing text manipulation, because Unicode code points are *not* "characters" or any other unit that users think about. This is due to combining characters and invisible characters such as bidi indicators. There is a prefix code unit that eats the next 2 letters and turns it into a country flag! It is a huge mess.

Far more important is they all lack the ability to store errors that are in a UTF-8 string in a lossless way. This means you cannot trust arbitrary 8-bit data to survive translation to "Unicode" and back. This has been the source of endless bugs and is the reason people can't use Python 3.0.

Comment Re:I am amazed (Score 1) 248

My recommendation is special interators on std::string. Something like this:


    for (utf16_interator i = string.begin(); i != string.end(); ++i) {
          int x = *i;
          if (x < 0) error_byte_found();
          else utf16_found(x);
    }

There would also be interators for UTF-32 (probably what you were thinking of as "Unicode" but a lot of Microsoft programmers think "Unicode" means UTF-16). And iterators for other normalization forms. In all cases these would return negative numbers or some value that cannot be confused with a code point for UTF-8 error bytes.

This would be very fast because you can find the next Unicode code unit or whatever in constant time. Any api where you can arbitrarily index a unit using an integer is not going to be constant, it will be linear with that integer. Iterators avoid this.

Comment Re:Lol (Score 1) 248

No you don't. You are demonstrating the typical moronic attempts to deal with UTF-8.

Here is how you do it:

Go X bytes into the string. If that byte is a continuation byte, back up. Back up a maximum of 3 times. This will find a truncation point that will not introduce more errors into the string than are already there.

BUT BUT BUT I'm sure you are sputtering about how this won't give you exactly X "characters". NOBODY F**KING CARES!!!! If you want the string to "fit" you should be *measuring* it, not saying stuff that has not been true on computers since the 1950's about "N characters fit". I bet you think a combining letter and accent should count as 2, huh?

And your display function should not crash because it was given a string with an error in it! Even if you stupidly inserted the ellipsis all it should do is draw a few error indicators before the ellipsis.

Comment Re: Lol (Score 2) 248

No, the problem is code that pretends that illegal UTF-8 sequences magically don't exist!

For some reason UTF-8 turns otherwise intelligent programmers into complete morons. Here is another example from Apple. Let me state some rules about how to deal with UTF-8:

1. Stop thinking about "characters"!!!! This is a byte stream. The ONLY reason to think about a "character" is because you are DRAWING it on a display designed for a human to read, and humans do think about "characters". All other software either does not care, or is concerned with far more complex patterns (such as regexp and editors that deal with words and sentences), these second ones are not helped at all by an intermediate translation.

2. It is TRIVIAL to detect that the byte sequence you are looking at is not a valid UTF-8 character. In this case draw a replacement for exactly ONE byte and then try the next byte to see if it is a valid sequence. Do not skip more. There must be one error per byte so that the maximum number of good characters is preserved and so that a sequence with errors can be parsed bidirectionally without looking more than a few bytes ahead, and so that it is possible to search for error patterns. It also means there are only 128 different errors, not millions.

3. NEVER "translate to Unicode" (ie UTF-16) because this will be a lossy conversion of these invalid sequences and thus you have not preserved the original data. I'm sorry but Microsoft really screwed us here. Best recommendation is to write a wrapper around the filesystem calls and translate from UTF-8 to UTF-16 at the last moment, using U+DCxx as a translation for the error bytes (this is lossy but filenames already are, due to case independence, Apple's normalization, and even on Unix where "./foo" and "foo" are the same file).

This is blatantly obvious if you substitute "words" for "characters" and imagine how you would write a program to deal with text strings. Words are also composed of multiple bytes in a row. For some reason nobody seems to crash on misspelled words, and they manage to concatenate and split strings and make whole file systems and diff programs and all kinds of other fancy text manipulation without having to translate the text so that each word is a fixed-sized integer. Amazing!

Comment Re: Just another arrogant CEO (Score 1) 49

There's a surprise - you don't consider anything worthwhile unless it's a closed source piece of shit. Some standard.

No, I don't consider it worthy of note unless a lot of people give a shit.

Oh - right, you have a Windows phone (because it's closed source which is better) and it doesn't have pulseaudio - so other than Skype, nothing of importance uses pulseaudio.

I have a Titan running SOKP.

Comment Re:What else is new... (Score 3, Insightful) 110

So what else is new? Most "Global Business Leaders" don't know much about anything else,

So, you call it an ivory tower when it's intellectual, what do you call it when it's just a tower made of stacked-up money? The reason why "global business leaders" don't know about technology is that they are completely divorced from the daily life that normal humans live. They don't have to know shit, so they don't know shit. Then they want to tell us all about how to be successful. We're always having to endure quotes from Bill Gates or Warren Buffet, who both were born with silver spoons in their mouths, about how we can supposedly be successful — but they actually have no idea how to become successful, because they were born into positions of privilege. We should not give one tenth of one very small shit about what they think about becoming successful, because they never did.

Comment This is exactly what belongs in cars (Score 2) 76

The ONLY thing that should be built-in is a speaker system, tuned to the acoustics of the car, with an amplifier and aux jack

What you don't seem to realize is that this is the equivalent of that for connecting your smartphone to your car. If only you knew anything about the subject at hand, like about cars or automotive infotainment, you would know that these are technologies for interfacing to your phone while in your car — and that these are both major standards. That means that virtually all smartphone owners will be able to use the interface in the car, designed for automotive use, instead of the interface on their phone which isn't.

Comment Re:There is no need for the Patriot Act (Score 2) 389

The Patriot act should go away and the US powers that be should focus its efforts on neutralizing the Sunni-Wahabi threat by whatever means necessary.

Hahahaha

Unfortunately we are taking the wrong side here in helping the Saudi's eradicate a Shia Minority in Yemen.

What we did in Iraq was separate peacefully coexisting communities of the people you're talking about. We deliberately set social progress back a hundred years there. What you think we need to do is literally the opposite of our government's intent.

Comment Re:What happens when you have insular advisors (Score 1) 389

Note to Obama: You are being lied to.

Obama has been consistent, at least since taking office. Before taking office, he spoke out against mass data collection. But ever since taking office, he has consistently stated that we require this data collection to be safe. We have met the enemy, and he is us. And that very much includes Obama.

Comment Re:Caught Up (Score 1) 105

Everything about the web is like that. We are in the process of doing "on the web" everything we have already been doing locally for decades,

And we're doing it in a way that brings us right back to the era of mainframes. Although far more advanced, the model is highly similar to that of the IBM mainframe systems whose semi-smart terminals understood form fields and submission.

Comment Re:How do the "poorest residents" own homes (Score 1) 272

So, how do the "poorest residents" own a home?

By spending half or more of their income on their mortgage, and by having a co-signer and a down payment. It's not rocket surgery, it's just dealing with the evil, evil banks. The banks are sitting on something like three houses for every homeless man, woman, and child in America, refusing to drop their prices to market level.

Slashdot Top Deals

1 + 1 = 3, for large values of 1.

Working...