Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Comment Re:Slashdot affected as well (Score 1) 290

/. does support Unicode (UTF-8 sucks, btw - it's a compatibility hack).

I was guessing your house wine was UTF-32 even before the last paragraph. Unfortunately it lacks compatibility with the size of existing Google datacenters, though it's nothing that couldn't be solved with more circuitry and a beefier power feed.

You absolutely can parse UTF-8 backwards: "continuation bytes all have '10' in the high-order position". How much easier does it have to get? Please inform me how your pushmepullyou parsing system is defined such that all code points are pallindromes with no loss of space efficiency.

Ken Thompson of the Plan 9 operating system group at Bell Labs then made a small but crucial modification to the encoding, making it very slightly less bit-efficient than the previous proposal but allowing it to be self-synchronizing, meaning that it was no longer necessary to read from the beginning of the string to find code point boundaries. Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with Rob Pike. The following days, Pike and Thompson implemented it and updated Plan 9 to use it throughout, and then communicated their success back to X/Open.

Good grief, if Thompson and Pike are the scourge if right thinking, our species is doooooomed. However you describe it, the present state of Slashdot's Unicode handling is a disgrace to God, geek, and man.

Comment Re:Dictation versus typing (Score 1) 287

Then fixing those mistakes is even slower than fixing a typographical one.

If you're writing for the New Yorker, fixing mistakes takes weeks. But then they get into not just whether the noun itself should be in the possessive form, but whether your sentence should require a noun in the possessive form in the first place.

The kind of mistakes you're talking about are specifically keyboard level mistakes. Spelling, orthography, and missing or duplicated words.

It's a tremendous cognitive burden to type at 120 words per minutes of something you're composing on the fly while also getting all the minor details of spelling, punctuation and orthography correct (not to mention getting your homonyms correct which I can usually do at speed if there are two isolated main forms to resolve, but not for palette/pallet/palate or muddles like Seine/sine/sign/sing/singe/singer/signer/seignior/Seigneur/senior/Seniour/senorita where finger habits start seeing double).

If you're not trying to go the orthographic last mile (while neglecting the stylistic last mile) dictation is hugely faster than typing to capture the gist. With dictation, you also get a useful side channel on your emotional inflection and the pacing of your word flow. It's not the same cleaned up and transcribed.

With dictation, one is free to swoop around and really think and make connections and shift and shape and reorganize. If you sit down at a keyboard in that state, you might as well open Mind Manager and type with your mouse.

Back to the actual subject, this is a typical worthless (and breathless) press release. He's sounding the "invest now, or forever be left behind" klaxon. They might be close, or not so close, or we might never see this.

Sure, Seagate could have told people back in the 1980 that they were targeting 1 TB/platter with their fancy magnetic recording technology. But really, with where they were at at the time, there was no connection to where we're at now. It wasn't a better investment in 1980 because we hit 1 TB/platter now. So these "could do" numbers are often exceptionally worthless, even when true.

Comment Re:Hurry it up (Score 2) 103

A million college students are waiting anxiously for this tool now that some professors have started checking their essays electronically for plagarism.

This assumes that they're as stupid as we all suspect, because the next thing the administration begins to do is check whether the student's written oeuvre is self-consistent without bunkering down under a blander identity than a Milli Vanilli cover of Valium Spice.

I'm so busted.

Comment Re:Just doesn't work... (Score 1) 245

One extra detail: the alphabet of 50 characters was the effective entropy over a much larger space of symbols. I described the tree in entropy space, because that is the what mattered to its performance profile. The naive view is that the symbol set contained 8000 symbols and that four character strings could be selected from a set of 8000^4 members.

I ignored this detail because conventional reverse engineering would very quickly determine that we only go to the hash table for a much smaller nucleus of the problem space. That filter was a couple of pages of code. Nothing major, but not trivial to guess without some appropriate expertise.

Comment Re:Just doesn't work... (Score 2) 245

Great, so all you have to do is replace that conditional so it always evaluates to true, no? When you actually do this, the program happily writes an answer to the screen every time. The only problem is, if you provided an invalid security key at the beginning, the answer it writes is complete nonsense. You see, it's secretly already tested the security key, and if it was wrong, the answer ends up being wrong too.

I implemented exactly this circa 1990 to protect a small database of disambiguation rules structured as a hash table. A random value obtained from the security dongle was intermixed into the hash function and hash check condition. This was not done once for each possible lookup as defined in a conventional database. It was done once for each feasible answer for each possible lookup. The code had a statistical model of feasible answers. For some queries the number of feasible answers was excessive (too many dongle interactions) so we created a heuristic that was correct 99% of the time and set aside the 1% for a second pass with an additional data structure. If the dongle wasn't present the set of feasible answers was incorrectly narrowed with the expected statistical distribution. The members of that distribution, however, were entirely wrong.

We built up more complex queries from smaller queries. We were actually building a tree where every path in the tree was a valid answer and the majority of leaf nodes were at depths 2-4. That we hit a leaf node was a bit of metadata from the hash table lookup, which would be wrong if the dongle wasn't installed.

How about a quick forward description. Start with an alphabet of 50 symbols and construct the tree of all strings of length one to six. Every node in the tree has a flag about whether that node terminates a valid string and some additional bits about the correct orthography of the string as expressed in the user input, when typed. Your database is a subtree of this tree with about 100,000 strings (problems were so much smaller 25 years ago) along with a couple of bits of metadata per leaf. It's pretty sparse compared to the 15 billion possible leaf nodes.

The database subtree is actually constructed by elimination. One dongle assisted hash probe tells you whether a descending edge from your current vertex leads to a non-empty subtree (further solutions with your current path in the tree as a proper prefix). In addition, the user input defines another subtree of everything that could possibly matter to the conversion being performed. What you are computing is the intersection of these two subtrees: the tree corresponding to the task at hand and the tree corresponding to all solutions possible. Because the hash table was decomposed on the principles of minimum description length, when the dongle was absent (or corrupted) you still get an answer with much of the expected statistical distribution.

Except for one thing. The hash check was imperfect and you would get some false positives. We set up the rate of false positives so that the set of false positives grew exponentially as you descended to deeper levels. We knew from the statistical structure of the user input that few elements of this phantom solution set would interact negativity in practice even though the phantom set vastly out-numbered the legitimate set. Further, if one tried to enumerate the tree exhaustively using an incorrect dongle hash function, the tree you would reconstruct had no depth limit. It grew exponentially in size forever. We knew there was a depth limit when correctly probed, but this was nowhere expressed in our program code. In fact, this could be used to reverse engineer the correct hash function: only the correct hash function enumerates to a finite set of 100,000 subtrees. Just iterate over the set of all possible hash functions, in some well-structured enumeration order, until you discover this condition. Bingo, you're done.

Not all of the phantom space was harmless, so we ran a test on that and identified all the members of the phantom space likely to interfere in practice and coded an additional data structure about 25% the size of the main data structure which encoded the set of harmful phantoms on the principles just expressed. I think we tuned this second hash structure to have a lower rate of phantom production, otherwise we would have needed a third structure to restore the solutions incorrectly eliminated.

So the desired answer set was the (user problem tree) intersected with (database tree - bogus database answer tree + [non existent] bogus bogus answer tree ...).

I won't get into it, but you can construct hash tables encoding these subtrees at pretty close to the Shannon entropy by balancing the number of hash check bits against the sparseness of the subtree encoded.

We didn't use an ordinary hash table. We used a globally optimized hash table computed using a bipartite graph matching algorithm where every hash query had a set of three locations to examine and if any location returned a hit, you added that node to your subgraph descent set (if more than one of these locations was positive, you had at least one hash accident but that didn't tell you anything you could use). With three locations per probe reconciled with the bipartite graph algorithm, the hash table would achieve a bit over 90% occupancy rate and constant-time probe rates (we always tested all three cells, because each hit added metadata concerning the path, you had to accept all answers).

The hash table placement algorithm (bipartite graph solver) was not included in the distributed software. Nor was the statistical model used to construct the tiny phantom correction table. Without duplicating this work, any attempt to replace the supplied hash function (in hardware) with a different software hash function would require data structures about 50 times as large as we had employed, according to one estimate I made.

The only viable and practical attack significantly less difficult than reproducing much of our original work was to crack the dongle hashing algorithm and encode it in software, eliminating the hardware security lock. It was hard to suck the encoded information out of this structure, because it contained a lot of noise.

If you had a huge corpus covering the space of typical user input, you could discover which parts of this data structure was used in practice as a statistical construct. But anyone who had that wouldn't be ripping off a low quality reproduction, they would compete straight up. It's about manipulating incentives.

The problem with this technique is that it was pretty much a one-off. You had to tune the hash rejection rate just right so that the phantom elimination tables converged to finite size. You needed to constrain worst-case performance on any possible input string (we did this by throwing away regions before lookup where the sparsity fell below a certain threshold). And you needed an application space that tolerated imperfect answers. In our case, a wrong disambiguation of user input to Asian characters. There was also a conventional B-tree database for user-generated expressions which could be used to supplement or override any rough edges that poked through from what I described above. Our application had all of these things.

What I learned, though, is that one can go pretty far in this direction under the right conditions. This system was extremely resilient to conventional reverse engineering. We were fortunate that memory-resident hash tables sustain such high access rates, because the amount of memory we touched compared to a convention database was a hundred to a thousand times more. One sentence of input with the most productive symbols would probably hit our entire data structure multiple times over. Even on a 486, we could manage 100,000 to one million hash probes per second, depending on how aggressively we mixed in randomness from the hardware dongle. Even then, we drove that parallel port dongle to ten times its specified rate. It might have been producing bogus values some small fraction of the time. If so, it was never noticed amid all the other noise inherent to the problem space.

With this result I smell a rat because there's no discussion of computational burden up front. I'd be shocked if the obfuscated software ran at 1% of the rate of a conventionally encoded algorithm. But still, a critical nucleus at the center of your system that resists reverse engineering is a potent building block to discourage competition.

Competition is of course just another word for innovation. A large field of innovation is stifling competition. Innovation is passive-aggressive like that, which is why Microsoft loves this word. Personally, I wouldn't date that chick. The problem with defining worthy innovation (worthy of nasty protections such as the patent system) is that it's deeply frame dependent. What looks like the distillate of hard mental labour in one frame of reference is an automatic result in some higher frame of abstraction we haven't managed to reach yet. So patents are often awarded to the idiot who arrives there by the most awkward possible method in the least appropriate frame of reference.

If someone wants to make a living coming up with a partial result by feats of intellectual acrobatics a decade before the same result is convenient to achieve as an automatic result within a higher frame of reference, I don't have a huge problem with granting limited patent protections, but only so long as you're truly ahead of the curve. If any frame of reference comes along where you special result becomes a general result, game over for your expensive patent--in an ideal world that will never exist.

LZW is a good example where the early implementations were delicately tuned though a mixture of inspiration and empiricism to achieve viable performance levels. But the entire space of time/space efficient LZW implementations can be fairly thoroughly explored in a decade or so within the right algebraic apparatus, making every efficient implementation a direct result.

A person smart enough to do the algebra first would have no claim to patentability at all. One can not patent beautiful objects such as algebraic expansions of pi. It's just too universal deep down.

The byproduct of being able to obfuscate your algorithm is that your Chinese competitor can obfuscate ripping it off. So in a sense, it's highly desirable that there's a horrendous performance penalty with each additional nesting.

Comment keyspace negawatts (Score 2) 207

This particular scenario is rubbish.

It's weird that PHK framed it this way, but he's on the right track, regardless. Compromised entropy is one of the largest persistent attack surfaces in the state surveillance war. It's darn hard to notice when your client-side random key is leaking key space from prior exchanges, unless we're all running perfectly vetted software every day of the week and twice on Sunday and nothing bad ever happens to the golden master distribution chain. Developers never lose their private keys ...

From the dark side, at Borg scale, it's a slow war of attrition. The more they know about you, the better their guesses become. Suppose they gain possession of a dozen of your passwords from the least upstanding corporations you deal with. Your passwords have zero cross-entropy, right? Every password entirely unconditioned on any other password you've ever used?

And it if turns our you're a member of the 0.01% who uses distinct, randomly generated sixteen-character password strings for every site, so much the better to target you with other methods.

This isn't a battle over the yield strength of the titanium crypto primitives. It's a battle over the total burden. Every person who re-uses the same password a dozen times is that much less computation. Password cracking is like Type II b muscle fiber. It's the muscle fiber of last resort, that one your body activates to lift an overturned car off your child after a crash. Traffic analysis is Type I muscle fiber, the fiber you can use all day long, day after day.

That big hassle with the self-signed certs (which are needed for authentication) significantly thinned the default use of strong encryption for simple privacy. These did not need to be tied together as they were. Because the use of encryption stands out and the connectivity graph is below the percolation threshhold, it becomes hard to set up covert onion routers.

The focus on encryption strength is mostly red herring to distract us from the real agenda, which is keeping the general run of affairs extremely sloppy. The whole surveillance apparatus depends on the bulk manufacture of negawatts (shedding keyspace) in dribbs and drabbs by various murky political means. It's not a hard war, it's a soft war.

Comment Re:Sony, for example (Score 1) 234

Companies have NOTHING to fear from consumer retaliation.

You're nuts, man. Sony took it in the nads for their blunder with the PlayStation 3. You know, that small setback where they allowed a Monogolo empire with deep pockets and not much traction to sweep behind their Maginito line and plant the Xbox flag atop Mount Suribachi. (By the way, would you be interested in picking some Lehman Brother shares I have lying around? Can't lose investment. Too big to fail and all that.)

Yes, the sumo wrestlers pick themselves up again all too quickly after their flagrant misdeeds. It's hard to knock them completely out of the ring. Whatever fear they experience momentarily is replaced by arrogance just as soon as their testicles re-inflate. (Hint: They're not pinching their nose and convulsing their chests because of some smell they've left behind.)

The girls, they kiss frogs. That's how it works. The triumph of hope over experience is what our species is all about, so much the better if there's an IPO with some DRM.

Comment regulation gulag (Score 1) 234

The argument over 'more regulation' vs 'less regulation' is about the stupidest argument out there.

It's not an argument. "Regulation" is a code word for power. Either the government holds this power, or private interests hold this power. There's no middle ground, due to the convexity of the slippery slope. It's either a firewall configured with a default "block all" or a default "pass all". Those are your two choices, 100% mutually exclusive.

Besides, inhabiting the middle ground involves the tedious art of knowing the difference, which is not what people with power enjoy doing.

Comment Re:Serious Doubts on Canonical's Ability (Score 1) 251

Canonical's stuff makes GNOME3 look usable. That takes some doing.

I'm sure any distro has rough edges. My experience on Ubuntu was just fine. But then they decided that neither preserving their user's work-flow equity nor advance notice of aggressive disruption were valid terms in the quality equation, so I bailed out of their ecosystem with extreme prejudice. Some of us older types actually derive value from persisting with entrenched methods.

Sometime nearly a decade ago I came across a Motorola web site for some hot embedded processor where you had to sign a form declaring an intent to purchase no less 10,000 parts (if selected) in order to receive the specification sheet.

Even if just a drop in the cell phone ocean, there's no reason the chip vendors can't cut a competitive price on volumes of 40,000 where larger commitments already exist on other contracts. The main reason they don't do this is to keep those large commitments happy that they are getting a favourable price. It has nothing to do with scale.

Samsung in particular would like to see some differentiation in the phone market where they are less under Google's thumb. I can see Samsung going "oh hell, sure, if you're only going to do a pilot run on a concept phone, we'll give you our best volume price on the components and watch with interest from the sidelines". At the same time, there are any number of premium Android phone design teams who have fallen on hard times who wouldn't turn down a third-party hardware design contract while they try to pick themselves up off the canvas.

Ubuntu is more than capable of getting the Linux component to work at least decently by the standards of people who view change as entertainment.

I don't see this project as being that risky if Ubuntu has already lined up the right concessions on the componentry and hardware design fronts. I just think it's a silly amount to pay for an Asus Transformer that dual boots. But hey, whatever floats your boat. What I do know about this kind of thing is that many people suck at NPV specs deflation. The kickstarter fora always fill up with people on delivery day who skipped the algebra class on slope and intercept.

Comment self edit: s/could/couldn't (Score 0) 353

... left IBM because he couldn't get anything done ...

Lameness severity is typically evaluated on a scale of 1 to 5, with higher numbers indicating a more significant degree of impairment. A 1 rating suggests a horse with a minor gait deficit, a 5 is "broken-legged" lame, indicating that the horse will not put weight on the affected leg. Initial assessment may include a visual check for outward injuries such as cuts or swelling, observation of a horse as it travels at different gaits, particularly the walk and trot. Flexion tests may also be performed, and hooves will be checked for signs of injury.

Comment he who has less gold breaks the rules (Score 2) 353

Jan Wong

In 2006, Wong attracted attention by imitating the work of Barbara Ehrenreich and going undercover as a cleaning lady in wealthy Toronto homes. While employed by the Globe and Mail as a reporter Jan Wong impersonated a maid and then wrote about her experiences in a five-part series on low-income living.

There were many social issues discussed in this series of articles, the majority of which I didn't agree with as framed. One issue she pointed out was that these barely-literate low-income scullery-scrubs few of whom had driver's licences were expected to haul vacuum cleaners through the Toronto metro system between jobs that were not as proximal as a modern UPS delivery route.

Brown Down: UPS Drivers Vs. The UPS Algorithm

No, the scheduling algorithm employed by the scullery-scrub dispatch office involved chewing up small bits of paper and spitting them at a map, because they were getting away with NOT PAYING for the delivery of vacuum cleaners by their downtrodden and raw-fingered cleaning staff. Many of these barely-solvent workers were putting in eight hour on job sites, plus another four hours (unpaid) moving between job sites, toting equipment that wasn't even their own for less than the cost of delivering the equipment by any other business method.

Jan Wong could have gone to war over a clear violation of labour fairness, but she instead decided to do a lot of public hang-wringing over systemic issues unlikely to ever change.

It's Apple's job to politely inform their store managers that this violates accepted labour practice and to put an end to it as thoroughly as they do with unwelcome rumours about unfinished products.

I once spoke to an ex IBM employee in the early 1980s who said he left IBM because he could get anything done. His department was under such tight security that it took him an hour to get to his desk in the morning and another hour to leave it in the afternoon. I think part of that was fetching his work product from a secure area and returning it there again with an inspection. He was well paid for the whole ordeal, until it finally drove him nuts.

The rule in a democratic salary market is that time is money. Even if the money is too small to spit at from the perspective of the person writing the cheques.

An anecdote I liked from that series was the incident(s) where business owners tried to bully her out of using street parking in front of their stores (which they would prefer to see used by customers) on the presumption that she was timid and uneducated. It almost blew her cover confessing she knew how to drive in the hiring interview. I think she had to tell some huge sob story to make her desperation believable to take such a job as a person who could hold down a driver's licence.

Comment Re:What could possibly go wrong? (Score 1) 187

Creating and distributing large quantities of bacteria with unknown long term effects is not a known quantity and hence .. is not a sustainable solution.

You left out a step in the middle. It's called a MOOC. That's where you learn things you didn't used to know. Everything one doesn't understand has unknown long term effects and hence is unsustainable.

Comment circuit strip (Score 0) 226

The teaser margin caught my eye with a circuit strip (teaser margin = (WU- (pi/4))*XGA on most web sites these days, excluding content viewed through a dancing thumb while traversing Steiner diagrams in a busy urban core with the permanent postural stoop of Vermilingo Erectus).

Props for the big solder blob. No circuit is complete without one. The end.

Comment banish "it can" (Score 2) 543

Some people are surprised to learn that you can also extend Visual Studio with new windows such as those I just described.

This is typical of Microsoft products: obscure "yes it can" capabilities that you can't rely upon for continuity from version to version. Macros? Poof.

Come on reviewers, picking out chopsticks does not count as "playing the piano". Microsoft products in particular needs to auditioned savagely before giving credence to any self-assigned tick marks, or awarding gold stars for limbo dancing under the bar instead of over the bar on standards compliance. Simon says "That's four noes." Especially in the late nineties, the vast majority of Microsoft product reviews were channelling Paula Abdul. Eventually I burned "yes it can" in a Salem bonfire.

I've used Eclipse fairly heavily for C++ and R and I don't find it sluggish. Yes, it's far from perfect. Docking operations on the newest release went a bit insane on my 22" monitor in portrait mode. Hopefully that's just teething pains early in the release cycle.

Comment worms before flowers (Score 1) 1501

The one where the person that now develops a kernel that ships with FUSE and CUSE, and which has its largest install base running on top of the Xen microkernel in cloud deployments or an L4-derived microkernel in mobile deployments, was saying that microkernels are bad?

It was Linus's original goal in 1990 to achieve the largest install base on top of the Xen microkernel? This is news to me.

The most important criteria with any new project is to obtain critical mass of collaborators and users. Stroustrup didn't want to base C++ on C. His largest influence was Simula 67. It ended up being fairly hideous, intellectually, to graft Simula programming idioms on top of C. At the same time, the underlying C language compatibility was the main reason C++ was adopted by most people in the first place, whereas language designers who placed more value on purity and aesthetics now languish in relative obscurity.

One could make a strong case that Tannenbaum's present success is parasitic on the success of Linux itself, since Linux ended up becoming--within a rabbinical epsilon--the most significant force shaping the ecosystem where Tannenbaum's kernel eventually gained traction.

In raw soil, usually the worms precede the flowers. Tannebaum can suck it.

I would also argue that the success of C++ has been good for C, because it released C from the pressure to evolve in a direction less well suited to the niche it presently dominates. C++ is heroine to a language lawyer. From the perspective of the C community, good riddance.

The problem with aesthetics driven design is that there's always some use case that takes it up the wazoo. Aesthetics always moves in the direction of divorcing messy reality. That reality might be your own. One can also describe this as a refinement of the application domain. This rocks when it works. Worst case scenario is when the glass ceiling of aesthetic refinements slam you like a bird into a spotless pane after your project reaches a million lines of code. The culture of C++ is that embracing messy reality is Job Number One and that elegance is subordinated to this goal, which is why C++ is strong in genericity and weak in garbage-collected managed memory.

C++ has a first-growth generalist mandate married to a progressive pragmatism. The Linux kernel has a first-growth generalist mandate married to a conservative pragmatism a mile wide, and a culture to match.

Python has a second-growth generalist mandate married to a reductive pragmatism. It's strange to compare the culture of Postgres, as someone else did, which is the epitome of a paradigmatic buy-in. Once you buy into a relational data store with ACID integrity, you're already halfway to becoming a Mormon church, never again to be bothered by the hubbub of the NoSQL gospel choir on the other side of the tracks. Linux by comparison is a Unitarian church in raw-tongued multi-ethnic Sydney. One chick thinks it should be more like Toronto. You know, Toronto is great and all, but one is enough.

Slashdot Top Deals

Research is what I'm doing when I don't know what I'm doing. -- Wernher von Braun