Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Comment Re:I am amazed (Score 1) 248

Actually I think "Unicode strings" should be avoided completely.

They do not help at all in doing text manipulation, because Unicode code points are *not* "characters" or any other unit that users think about. This is due to combining characters and invisible characters such as bidi indicators. There is a prefix code unit that eats the next 2 letters and turns it into a country flag! It is a huge mess.

Far more important is they all lack the ability to store errors that are in a UTF-8 string in a lossless way. This means you cannot trust arbitrary 8-bit data to survive translation to "Unicode" and back. This has been the source of endless bugs and is the reason people can't use Python 3.0.

Comment Re:I am amazed (Score 1) 248

My recommendation is special interators on std::string. Something like this:


    for (utf16_interator i = string.begin(); i != string.end(); ++i) {
          int x = *i;
          if (x < 0) error_byte_found();
          else utf16_found(x);
    }

There would also be interators for UTF-32 (probably what you were thinking of as "Unicode" but a lot of Microsoft programmers think "Unicode" means UTF-16). And iterators for other normalization forms. In all cases these would return negative numbers or some value that cannot be confused with a code point for UTF-8 error bytes.

This would be very fast because you can find the next Unicode code unit or whatever in constant time. Any api where you can arbitrarily index a unit using an integer is not going to be constant, it will be linear with that integer. Iterators avoid this.

Comment Re:Lol (Score 1) 248

No you don't. You are demonstrating the typical moronic attempts to deal with UTF-8.

Here is how you do it:

Go X bytes into the string. If that byte is a continuation byte, back up. Back up a maximum of 3 times. This will find a truncation point that will not introduce more errors into the string than are already there.

BUT BUT BUT I'm sure you are sputtering about how this won't give you exactly X "characters". NOBODY F**KING CARES!!!! If you want the string to "fit" you should be *measuring* it, not saying stuff that has not been true on computers since the 1950's about "N characters fit". I bet you think a combining letter and accent should count as 2, huh?

And your display function should not crash because it was given a string with an error in it! Even if you stupidly inserted the ellipsis all it should do is draw a few error indicators before the ellipsis.

Comment Re: Lol (Score 2) 248

No, the problem is code that pretends that illegal UTF-8 sequences magically don't exist!

For some reason UTF-8 turns otherwise intelligent programmers into complete morons. Here is another example from Apple. Let me state some rules about how to deal with UTF-8:

1. Stop thinking about "characters"!!!! This is a byte stream. The ONLY reason to think about a "character" is because you are DRAWING it on a display designed for a human to read, and humans do think about "characters". All other software either does not care, or is concerned with far more complex patterns (such as regexp and editors that deal with words and sentences), these second ones are not helped at all by an intermediate translation.

2. It is TRIVIAL to detect that the byte sequence you are looking at is not a valid UTF-8 character. In this case draw a replacement for exactly ONE byte and then try the next byte to see if it is a valid sequence. Do not skip more. There must be one error per byte so that the maximum number of good characters is preserved and so that a sequence with errors can be parsed bidirectionally without looking more than a few bytes ahead, and so that it is possible to search for error patterns. It also means there are only 128 different errors, not millions.

3. NEVER "translate to Unicode" (ie UTF-16) because this will be a lossy conversion of these invalid sequences and thus you have not preserved the original data. I'm sorry but Microsoft really screwed us here. Best recommendation is to write a wrapper around the filesystem calls and translate from UTF-8 to UTF-16 at the last moment, using U+DCxx as a translation for the error bytes (this is lossy but filenames already are, due to case independence, Apple's normalization, and even on Unix where "./foo" and "foo" are the same file).

This is blatantly obvious if you substitute "words" for "characters" and imagine how you would write a program to deal with text strings. Words are also composed of multiple bytes in a row. For some reason nobody seems to crash on misspelled words, and they manage to concatenate and split strings and make whole file systems and diff programs and all kinds of other fancy text manipulation without having to translate the text so that each word is a fixed-sized integer. Amazing!

Comment Re:Rich Family Dies, World At Peril!!! (Score 1) 184

nor could the department legally require officers to have sex as part of their job.

I'm sure they could find volunteers.

The second scenario is plausible except that you assume that the LEOs have as much or more "firepower" than the gangs.

True but I'm assuming the entire gang isn't going to show up for every incident (unless it's quite a small gang to begin with). I guess if the police started doing this regularly, they might.. on the other hand, they might also just cut their losses and stop hassling people who don't pay because it's not worth getting into a surprise gun fight with professionals over $50 (or whatever).

On the other hand, for a big drug deal where a lot of armed criminals really might show up... well we keep hearing about how police departments are getting all this surplus military equipment for next to nothing, so I'm not sure looking at the pure dollar figures tells you that much. For instance according to http://www.newsweek.com/how-am...

Police in Watertown, Connecticut, (population 22,514) recently acquired a mine-resistant, ambush-protected (MRAP) vehicle (sticker price: $733,000), designed to protect soldiers from roadside bombs, for $2,800.

I guess the real reason isn't that cops are idiots or greedy, it's that they have to weigh the benefit against the risk to their lives. Confronting a john who tries to hire an undercover cop as a prostitute poses little risk if the officer is armed. Little compared to waiting for the enforcer anyway. Busting a guy trying to buy weed is little risk compared to setting up a sting with a big gang and showing up in your MRAP and getting into a fire fight. Is it worth risking your life to put a negligible dent in the drug trade or the sex trade? Maybe not.

Comment Re:Oh wow (Score 1) 234

Generational poverty laughs at your willful ignorance of the world.

I'm not talking about the world, just America.

As do all the poor straight-A students

When we speak of helping poor students, we generally mean helping them achieve academic success. Not helping them get rich. So "all the poor straight-A students" are actually the success story here, not an object of pity.

It's like you're having the wrong conversation and getting really indignant about it.

Comment Re:Oh wow (Score 3, Insightful) 234

troubled kids from the hood, kids with learning disabilities, or poor kids whose single parents are working 2-3 jobs

Not everything has to be about the "troubled kids" you know. We spend more than enough money trying to help the troubled kids. I think society gets more bang for the buck from helping a bright kid achieve more than a troubled kid fail slightly less.

Comment Re:did they damage the car? (Score 1) 461

I assume you replied to the wrong person. Surely you meant to say that to the person who said "The terrorists are the Federal Government of the United States; their enemy is We the People" (GGP) rather than GP.

Amiga3d's example of Islamic terrorism is perfect, but the example of the federal government being a terrorist because they have occasionally violated the Constitution is ridiculous.

Comment Re:Machine learning? (Score 1) 184

what is the value, exactly, of saying that because his skin is brown, that we have to ascribe some sort of negative modifier on how we perceive his intelligence

None, because as an individual we can make an individualized determination for him.

intelligence is an INDIVIDUAL value. it does no good to class all people according to an arbitrary signifier.

It does if that arbitrary signifier has correlations with important outcomes.

The "good" is that it allows us to spend resources more efficiently when we want to influence those outcomes, or to conserve resources and accept certain things instead of trying to fight them.

As an example, there is a great deal of money and time being spent to address racial achievement gaps in education. The assumption is the aggregate statistics for each race should be about the same. What if that assumption is wrong? Then we're wasting money that could be put to much better use.

if you were interviewing a bunch of people for computer programmer, and disregarded the ones with brown skin because they were "less intelligent," you might have hired a dumb white person and disregarded the black genius

You're right, and that's a great example of dumb racist thinking. That doesn't mean much because non-racists or anti-racists are dumb too.

therefore, according to racist "thinking," we should assume all white people are rapists

See, this means you don't even understand what you're criticizing. What's this "all" business?

Here's a better example. Men are more likely to commit rape than women. Women are more likely to be raped than men. Pop quiz: did I just say that all men are rapists, or that all women are raped? Answer: nope.

by believing in racism, and all of the logical fallacies that come with it, you have objectively proven to me that you are a stupid person. i don't respect you

To me the biggest problem with you is that you conflate "acknowledging racial differences" with "believing in racism." Acknowledging that men commit more violent rapes than women clearly doesn't make me sexist. Does acknowledging that black men have a higher rate of committing murder than white men make me racist?

I don't consider myself racist, or at least not the dumb kind of racist you were talking about above who wouldn't hire a smart black guy because of a firm belief that all blacks are dumb.

But whatever. I don't respect you either because you've shown you can't have a serious discussion. You're hiding behind calling me "low iq" even though I'm quite smart, as are most people on Slashdot, and that should be evident to you from reading my posts.

Slashdot Top Deals

C'est magnifique, mais ce n'est pas l'Informatique. -- Bosquet [on seeing the IBM 4341]

Working...