petermgreen - Slashdot User

Comment Re:UTF-32 would save memory in some cases (Score 1) 165

by petermgreen on Sunday January 11, 2015 @11:53PM (#48790663) Attached to: NetHack Development Team Polls Community For Advice On Unicode

Comment Re: Short of memory? (Score 3, Insightful) 165

by petermgreen on Sunday January 11, 2015 @11:47PM (#48790637) Attached to: NetHack Development Team Polls Community For Advice On Unicode

What does "character" mean?

Something represented by one unicode codepoint? (making your statement a tautology)
Grapheme cluster? (what most users would consider a character)
A position in the character grid of a console?

Which brings us to the real question. to what extent do you want to support unicode? do you care about

* Grapheme clusters that take multiple code points to represent? (letters with multiple diacritics, unusual letter/diacritic combinations etc)
* Right to left languages? (hebrew, arabic etc)
* Languages where chracters merge together such that computer output looks more like handwriting than type? (see above)
* Languages where "fixed" width fonts use two different widths giving "single width" and "double width" characters? (chineese, japanese, korean)
* Characters outside of the basic multilingual plane? (rare Chinese characters, dead languages, made up languages, rare mathematical symbols)

Once you have worked though that design decision it will help you make others. What you find is that "length in unicode code points" and "unicode code point n" really aren't much more useful than "length in utf-k code units" and "utf-k code point n". Either is fine for sanity checking string length or iterating through a string looking for delimiter. Neither is much use for anything more than unless you are doing a very limited implementation.

UTF-32 seems enticing initially but turns out to be fairly pointless, by the time you get to caring about non-BMP characters you are probably also going to be caring about combining characters etc and it will massively increase the size of the vast majority of text.

UTF-8 vs UTF-16 is something of a tossup. UTF-16 lets you get away with treating each unit of the string as one "character" much longer which may be considered either a blessing (because you don't care about the cases where it doesn't work) or a curse (because you realise your assumptions were wrong much later after basing much more code on them). UTF-8 is smaller for text with lots of latin chracters, UTF-16 is smaller for text with lots of CJK characters. UTF-8 is the usual choice on *nix systems and internet protocols. UTF-16 is the encoding chosen by windows and Java.

Comment Re:about fax (Score 3, Informative) 790

by petermgreen on Saturday January 10, 2015 @10:50PM (#48784521) Attached to: Ask Slashdot: Sounds We Don't Hear Any More?

Comment Re:Shrug (Score 4, Interesting) 161

by petermgreen on Friday January 09, 2015 @10:30AM (#48774395) Attached to: HTTP/2 - the IETF Is Phoning It In

Comment Re:If you don't want to upgrade your box (Score 1) 100

by petermgreen on Thursday January 08, 2015 @03:04PM (#48767227) Attached to: Samsung Unveils First PCIe 3.0 x4-Based M.2 SSD, Delivering Speeds of Over 2GB/s

Comment Re:Well it also depends on chipset (Score 1) 100

by petermgreen on Thursday January 08, 2015 @02:49PM (#48767059) Attached to: Samsung Unveils First PCIe 3.0 x4-Based M.2 SSD, Delivering Speeds of Over 2GB/s

Comment Re:PCIe 3.0 availability (Score 1) 100

by petermgreen on Thursday January 08, 2015 @01:32PM (#48765957) Attached to: Samsung Unveils First PCIe 3.0 x4-Based M.2 SSD, Delivering Speeds of Over 2GB/s

Comment Re:Cert Pinning (Score 1) 163

by petermgreen on Wednesday January 07, 2015 @10:09PM (#48761319) Attached to: In-Flight Service Gogo Uses Fake SSL Certificates To Throttle Streaming

Comment Re:That's will be one dead astronugh (Score 1) 70

by petermgreen on Wednesday January 07, 2015 @09:01PM (#48760857) Attached to: SpaceX One Step Closer To Launching Astronaut

Comment Re:With apologies (Score 2) 65

by petermgreen on Wednesday January 07, 2015 @05:22AM (#48752941) Attached to: Wireless Charging Standards Groups Agree To Merge

Comment Re:Better way? (Score 1) 289

by petermgreen on Wednesday January 07, 2015 @01:22AM (#48752321) Attached to: Extra Leap Second To Be Added To Clocks On June 30

Comment Re:Man vs Machine? (Score 1) 289

by petermgreen on Wednesday January 07, 2015 @12:40AM (#48752177) Attached to: Extra Leap Second To Be Added To Clocks On June 30

Comment Re:Ha (Score 1) 391

by petermgreen on Tuesday January 06, 2015 @09:39AM (#48744655) Attached to: Sony Thinks You'll Pay $1200 For a Digital Walkman

Comment Re:Where should I hold my Bitcoins? (Score 2) 161

by petermgreen on Monday January 05, 2015 @05:16PM (#48740163) Attached to: Bitstamp Bitcoin Exchange Suspended Due To "Compromised Wallet"

Comment Re:Nothing about proxy though (Score 1) 67

by petermgreen on Monday January 05, 2015 @02:44PM (#48738693) Attached to: Netflix Denies There Was a Policy Change With VPNs

Slashdot Top Deals