Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

Comment Re:It's been bisected and confirmed (Score 1) 266

Sorry, a mouse button is a key for anybody in the real world. It even has it's own keysym values in X.

Also as pointed out, OS/X and Windows, and earlier verisons of X, worked this way. If this is really somebody saying "mouse buttons are not keys and I will obey the text literally" then that is really sad.

From the patch description this sounds like an accidental change, not deliberate. But beyond that it is hard to figure out what needs to be fixed. It sounds like there is a null pointer dereference, but only when the X server is shutting down. That's pretty minor and in fact something I know commercial software would ignore.

Comment Re:It's been bisected and confirmed (Score 2) 266

I'll bet this is going to be patched in the git repositor within a half hour.

But I'm not sure if posting Slashdot stories is the best way to get a bug fixed. But if it is the only one that works, might as well do it.

I still feel the original poster should have put *something* on that bug report in all the time since January 16th!

Comment Re:It's been bisected and confirmed (Score 5, Informative) 266

Goddamn that was painful, but I found the actual patch:

http://cgit.freedesktop.org/xo...

I would say it is rather shocking that this Peter Hutterer actually did about 90% of the work, then posted something that is not a clue as to how to see the answer.

And that the original poster (who I assume made this Slashdot story) did not post any followup for 3 months, probably leading Peter to forget all about fixing this.

Comment It's been bisected and confirmed (Score 4, Insightful) 266

Somebody has already narrowed the problem down to specific patch:

Comment 7 Peter Hutterer 2014-01-16 05:43:43 UTC

bisected to this commit:

commit 11319a922575f1da1d3c5774728c0dee12bab069
Author: Peter Hutterer
Date: Thu Oct 11 16:03:33 2012 +1000

        xkb: ProcesssPointerEvent must work on the VCP if it gets the VCP

It would help if that number was a link to the git log.

Comment Re: wooo look at that strawman BURNNNNN (Score 1) 301

What?

In your scenario, originally you had $100, and the other person had a stock share that he could trade for $100. Therefore there was $200 in total value. After the stock dropped to $10 value, one person has $100 and the other has a stock worth $10, therefore the total is $110. $90 of value was lost! Inflation/deflation of dollars does not matter, the result is that you now have $90 less of value, whatever $90 is now worth.

Also your claim that "the balance of transactions is $110" is pretty bogus. If the two decided to trade the stock back and forth 5,000 for $100 then by your calculation "the balance of transactions is $500,000". That number is obviously meaningless. The actual amount of money moved around is $100.

Comment Re:Interoperating with invalid data (Score 1) 196

Oh no! What if your split() worked in Unicode code points, and split a combining pair? What would you do, surely your computer will instantly self-destruct in a devastating explosion! What if your split() split an english word in two? What if your split() cut a UTF-16 surrogate pair in half (which EVERY single alternative to UTF-8 does!!!!!!) Yike! Disaster! Um, well, maybe not...

Stop making up non-existent problems.

1. Splitting is done after pattern searching. It is TRIVIAL to make your pattern search (which is likely doing something like "find the next space") only find full UTF-8 code units. In fact it will help get you to write stuff that matches more complex structures such as combining pairs.

2. If you are splitting at totally arbitrary points, it is because you are copying the data to a fixed-sized buffer. Virtually every use of this later pastes the contents of the buffers together (think of buffered file I/O) and thus it is harmless.

3. This splitting is 100% detectable because *both* ends will be invalid UTF-8.

4. For some reason nobody seems to worry about this for UTF-16. Hmmmm, I wonder why?

Comment Re:Interoperating with invalid data (Score 1) 196

Maybe you should design your own platform where strings will be represented internally as UTF-8. It would be an interesting exercise.

FLTK and Nuke, and the project I am doing at R&H all use UTF-8 with tolerance for encoding errors for all internal storage. It is really easy, far easier than dealing with two types of text.

About 90% of the work is to get around default converters in Python and Qt that screw up the UTF-8.

Comment Re:Interoperating with invalid data (Score 1) 196

Stupid software that thinks it has to convert to UTF-16 is about 95% of the problem.

UTF-16 cannot losslessly store invalid UTF-8. It also cannot losslessly store an odd subset of arrangements of Unicode code points (it can't store a low surrogate followed by a high surrogate, because this pattern is reserved to mean a non-BMP code point). It also forces a weird cutoff at 0x10FFFF which a lot of programmers get wrong (either using 0x1FFFF or 0x1FFFFF). UTF-16 is also variable sized and has invalid sequences, thus it has NO advantages over UTF-8, so the entire scheme is a waste of time.

Unfortunately a bunch of people are so enamored with all the work they did to convert everything to 16-bit that they are refusing to admit they made a mistake. One way is to declare invalid UTF-8 as throwing errors and thus make it virtually impossible to manipulate text in UTF-8 form. Note that they don't throw exceptions on invalid UTF-16, care to explain that??? HMM????

UTF-8 can store all possible UTF-16 strings losslessly (including lone surrogates which are considered "invalid" in UTF-16), as well as storing invalid UTF-8. It can encode a continuous range of code points from 0-0x10FFFF, or 0x1FFFFF with a trivial change (it can do up to 0x7FFFFFFF if you use the original UTF-8 design).

PEP 393 does NOT solve the problem. The "ascii" is limited to only 7-bit characters and thus cannot store UTF-8 (valid or not).

There is a "utf-8" entry in the PEP 393 strings but it appears current design requires it to be translated to UTF-16 and back to UTF-8 to store there, thus disallowing invalid strings. My proposal is that converting bytes to a string copies the data unchanged to this UTF-8 storage, and checking for encoding errors be deferred until there actually is a reason to look at Unicode code points, which is VERY VERY RARE, despite the impression of amateur programmers. I also propose some small changes to how the parser interprets "\xNN" and "\uNNNN" in string constants so that it is possible to swap between bytes and "unicode" strings without having to change the contents of the constant.

Comment Re:Interoperating with invalid data (Score 1) 196

Aha! Somebody who really does not have a clue.

No, substr() does not require decoding, because offsets can be in code units.

No, replace() does not require decoding, because pattern matching does not require decoding, since UTF-8 is self-synchronizing.

No split() does not require decoding because offsets can be in code units

No, join() does not require decoding (and in fact I cannot think of any reason you would think it does, at least the above have beginning-programmer mistakes/assumptions).

Comment Re:Interoperating with invalid data (Score 1) 196

Well the first thing you need to do to clean up the invalid UTF-8, for instance in filenames, is to detect it.

If reading the filename causes it to immediatly throw an exception and dispose of the filename, I think we have a problem. Right now you cannot do this in Python unless you declare it "bytes" and give up on actually looking at the Unicode in the vast majority of filenames that *are* correct.

It is also necessary to pass the incorrect filename to the rename() function, along with the correction. That is impossible with Python 3.0's library, and is probably the more serious problem.

Both of these problems are trivial to fix if it would just consider arbitrary byte sequences valid values for strings, and defer complaining about incorrect encoding until the string actually needs to be *decoded*, which actually is only really needed to display it, and sometimes for parsing in the rare cases that non-ASCII has syntactic value and is not just treated as letters.

Comment Re:Fuck that guy. (Score 1) 397

I really doubt a majority of people think affirmative action helps Asians. It helps underrepresented minorities, and in most jobs and schools Asians are not underrepresented. It seems incredibly unlikely that 95% of people (whether they approve or disapprove of affirmative action) think it helps Asians.

I suspect you actually made a typo of some sort but am curious what exactly you were trying to say there.

Comment Re: and... (Score 1) 196

I'm arguing against a design that is the equivalent of saying "you can't run cp on this file because it contains invalid XML".

There is nothing wrong with the xml interpreter throwing an error AT THE MOMENT YOU TRY TO READ DATA FROM THE STRING.

There is a serious problem that just saying "this buffer is XML" causes an immediate crash if you put non-xml into it.

Comment Re: and... (Score 1) 196

God damn you people are stupid.

I am trying to PREVENT denial of service bugs. If a program throws an unexpected exception on a byte sequence that it is doing nothing with except reading into a buffer, then it is a denial of service. If you really thing that invalid UTF-8 can lead to an exploit you seem to completely misunderstand how things work. All decoders throw errors when they decode UTF-8, including for overlong sequences and ll other such bugs. So any code looking at the unicode code points will still get errors. And if you think there is some exploit that relies on the byte pattern that somehow only works for invalid UTF-8 then you have quite a fantastic imagination but no knowledge of reality.

Slashdot Top Deals

E = MC ** 2 +- 3db

Working...