Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

Comment Re:Lol (Score 1) 248

There's no law that says they can't pad the variable length input to fixed length

I'm not sure you quite understand the problem, it's not the input length, it is the encoding of each of the characters. So are you suggesting turning all single-byte encoded characters into multi-byte encoding of some arbitrary maximum length? If you can already identify the problem at this level then you would just do that in the parser that is truncating the string.

...and then make sure you're handling combining character sequences and bidirectional text correctly.

Comment Re: Lol (Score 1) 248

It is not hard and it seems really obvious, but for some reason Unicode turns some otherwise really smart programmers into total idiots.

There's a lot to know, and people might not be aware of all of it and all the issues involved.

...and "really smart" might actually be a handicap if it means "I'm smart, I know how to do this, it's easy!", and not bother to Read The Fine Manual, whereas somebody less smart might find Unicode scary and actually bother to RTFM.

Comment Re: Lol (Score 1) 248

From that description it does sound like the string is still valid. However if the display is crashing on a certain sequence containing an ellipsis, I am not clear why you can't construct that string directly, rather than rely on the insertion of the ellipsis.

Yup.

It does sound like they maybe rely on "sanitizing" but of a far more complex scheme that I was aware of.

Not to me, unless by "sanitizing" you mean "shortening so it'll fit in Notification Center".

This is still wrong, maybe far worse, as they are detecting and rejecting patterns containing ellipsis and some other character

I've seen nothing to indicate that they're doing anything specific with ellipses, other than "sticking them in at the point of truncation to let the user know that the full message isn't being displayed".

About all I'd assume is that certain sequences of characters are not being handled correctly by some part of Core Text; perhaps it's assuming, explicitly or implicitly, that those sequences "can't happen" and, instead of drawing them, crashing, perhaps in an assert.

In this case their glyph layout should simply not crash on any possible arrangement of bytes or words in the incoming string.

Correct.

It is not hard and it seems really obvious, but for some reason Unicode turns some otherwise really smart programmers into total idiots.

There's a lot to know, and people might not be aware of all of it and all the issues involved.

Comment Re: Lol (Score 1) 248

So you are saying "fix the library". I am saying "sanitize input for library".

Both work, but I would argue that sanitizing for the library is usually a lot less problems.

"Programming for international environments is hard, let's go shopping!"

I would argue that you have perhaps not considered all the possible problems and have thus perhaps miscounted the problems with "work around a broken library by transforming perfectly legitimate Unicode character sequences into sequences that might not represent what the person sending the message intended", that being the correct description of the second approach to this problem in the list above.

Yeah, correctly truncating a message that could be an arbitrary sequence of text in multiple languages with combining character sequences and bidirectional text isn't easy, but, well, if you want to be thought of as a company that makes stuff that "just works", you'd better figure out how to make that complicated process "just work".

Maybe iOS 8.3.1 needs to have a quick fix of some sort, but iOS 10, if not iOS 9, should fix the truncation code.

Comment Re: Lol (Score 1) 248

In this case, the illegal UTF-8 sequence is the string after you have blown part of its funny foreign squiggle.

Where has it been proven that the bug is the trashing of a UTF-8 sequence?

First of all, Apple tends to use UTF-16 in the higher-level frameworks, e.g. that's how CFString/NSString work internally.

Second of all, processing entire characters rather than bytes is something I suspect Apple got right fairly early in the process. I suspect the problem is either that 1) when truncating the message for display, they're not processing entire graphemes, they're processing entire characters or 2) they're not taking bidirectionality into account or 3) they're not handling a combination of both issues.

He's saying that thing you call with your newly minted mangled string shouldn't fail.

Which is one way to solve it.

There are multiple things here that should be fixed. That's one of them - the renderer shouldn't crash if handed a bad string, it should fail more softly, e.g. put in a REPLACEMENT CHARACTER for all bad sequences and, if possible, log the error in a way that indicates that routine XXX has handed a bad character sequence to it.

I would argue, if the thing you calls mangles strings, sanitize its inputs so it doesn't get a string with a bad character (a unicode character of whatever format it uses internally, post-mangle).

And I would argue (all the way to the heat death of the universe) that, if you know that the thing you call mangles strings, and if it's produced by somebody else working on the same OS, you get it fixed so that it doesn't do that; you don't mangle user input (which includes text messages from other users) in released software, unless you don't have time to fix the underlying problem for the release.

Comment Re:Lol (Score 1) 248

It's a bad character if the library you call will fuck it up. That's what makes it bad.

If it's a valid character in the character set being used, and a valid representation of the character in the encoding being used for that character set, then it is by definition not a bad character; if the library you call fucks it up, the library is bad.

The fuckup isn't the lack of "sanitization" of perfectly clean strings, the fuckup is the library's inability to handle those strings.

Once you overwrite part of some multibyte character IT IS A BAD CHARACTER!!!

Then the fuckup is the overwriting of part of that character - or the overwriting of a combining character following a base character, or not handling bi-directionality correctly when figuring out where to and how to truncate the string. No, the rendering code shouldn't crash when handed the fucked-up string, but it should report the underlying bug somehow (in a way that gets back to the developer), so that bug doesn't go completely unnoticed.

Comment Re:Android to iDevice (Score 1) 344

...a $350 Android phone is a high-end device--or, at best, at the upper end of mid-range. Roughly 60% of Android phones retail for $200 or less. (http://www.idc.com/getdoc.jsp?containerId=prUS25037214). The $350 price point lands right near the top quintile of all Android phones. By contrast, there does not exist a low-end iPhone for sale at retail. That's a conscious decision on Apple's part, and matches their overall M.O.

Your phone is not one of the low-end phones that give such a bad user experience. Your phone is quite nice--and quite expensive--compared to the fleet of Android devices as a whole.

Comment Re:Android to iDevice (Score 1) 344

...well, that's sort of one of the features of Android. It's open, and it's run-on-what-have-you, so it should hardly be surprising that a significant chunk of the install base is running on cheap, low-end devices. It's a big part of the reason Android has such a large market share compared to iOS.

If Google can't pull low-end Android users onto high-end devices instead of iDevices, well, that's partly a failure of marketing, and partly the natural challenge of living in such a diverse world of devices. If a significant chunk of your market share consists of budget devices with bad user experiences that are targeted to non-technical users, you can hardly be surprised when those users clump the OS in with the phone itself.

Comment Re:Lol (Score 1) 248

(And you don't want to split it after N characters, if the goal is to limit the display length of the string you're displaying, as not all characters are the same width - and, of course, a base character followed by several combining characters might just have the width of the base character.)

...and, of course, when you're figuring out where to truncate, remember that some characters go right-to-left, not left-to-right - the string has both Roman-alphabet (left-to-right) and Arabic-alphabet (right-to-left) characters.

Comment Re:Lol (Score 1) 248

No you don't. You are demonstrating the typical moronic attempts to deal with UTF-8.

Here is how you do it:

Go X bytes into the string. If that byte is a continuation byte, back up. Back up a maximum of 3 times. This will find a truncation point that will not introduce more errors into the string than are already there.

As long as you're not splitting a sequence of multiple characters (multiple characters, some of which might be encoded in multiple bytes with UTF-8) some of which are combining characters. Don't split a character from a combining character following it. Splitting a sequence like that can introduce more rendering errors into the string than are already there.

(I suspect that's what the problem is in this bug, given that there are several combining characters in the string as shown in various places.)

(And you don't want to split it after N characters, if the goal is to limit the display length of the string you're displaying, as not all characters are the same width - and, of course, a base character followed by several combining characters might just have the width of the base character.)

Comment Re:Lol (Score 1) 248

Just because it's unlikely with a real text string doesn't mean that any of the text is invalid for a message. The text string should still not need to be changed. The bug only affects notifications, and it's clear that the text can be displayed just fine in conversation view.

This is almost certainly due to splitting multibyte characters on sub-character boundaries.

Or mishandling combining characters; the screenshot geminidomino provided shows several combining characters, as indicated by the dotted-line circles in some of the glyphs (and I suspect some of the marks above the Arabic characters come from combining characters as well).

Comment Re: Lol (Score 1) 248

No, the problem is code that pretends that illegal UTF-8 sequences magically don't exist!

Where's the illegal UTF-8 sequence in the message? Is the actual octet sequence in the message different from what's in this Slashdot posting (once converted to a sequence of octets), which contains no invalid UTF-8 sequences (yes, I went through them all by hand)?

Slashdot Top Deals

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...