jd - Slashdot User

Journal Journal: What constitutes a good hash anyway? 3

Journal by jd on Friday November 07, 2008 @05:37PM

In light of the NIST complaint that there are so many applicants for their cryptographic hash challenge that a good evaluation cannot be given, I am curious as to whether they have adequately defined the challenge in the first place. If the criteria are too loose, then of course they will get entries that are unsuitable. However, the number of hashes entered do not seem to be significantly more than the number of encryption modes entered in the encryption mode challenge. If this is impossible for them to evaluate well, then maybe that was also, in which case maybe we should take their recommendations over encryption modes with a pinch of salt. If, however, they are confident in the security and performance of their encryption mode selections, what is their real objection in the hashing challenge case?

But another question one must ask is why there are so many applicants for this, when NESSIE (the European version of this challenge) managed just one? Has the mathematics become suddenly easier? Was this challenge better-promoted? (In which case, why did Slashdot only mention it on the day it closed?) Were the Europeans' criteria that much tougher to meet? If so, why did NIST loosen the requirements so much that they were overwhelmed?

These questions, and others, look doomed to not be seriously answered. However, we can take a stab at the criteria and evaluation problem. A strong cryptographic hash must have certain mathematical properties. For example, the distance between any two distinct inputs must be unconnected to the distance between the corresponding outputs. Otherwise, knowing the output for a known input and the output for an unknown input will tell you something about the unknown input, which you don't want. If you have a large enough number of inputs and plot the distance of inputs in relation to the distance in outputs, you should get a completely random scatter-plot. Also, if you take a large enough number of inputs at fixed intervals, the distance between the corresponding outputs should be a uniform distribution. Since you can't reasonably test 2^512 inputs, you can only apply statistical tests on a reasonable subset and see if the probability that you have the expected patterns is within your desired limits. These two tests can be done automatically. Any hash that exhibits a skew that could expose information can then be rejected equally automatically.

This is a trivial example. There will be other tests that can also be applied automatically that can weed out the more obviously flawed hashing algorithms. But this raises an important question. If you can filter out the more problematic entries automatically, why does NIST have a problem with the number of entries per-se? They might legitimately have a problem with the number of GOOD entries, but even then all they need to do is have multiple levels of acceptance and an additional round or two. eg: At the end of human analysis round 2, NIST might qualify all hashes that are successful at that level as "sensitive-grade" with respect to FIPS compliance, so that people can actually start using them, then have a round 3 which produces a pool of 3-4 hashes that are "classified-grade" and a final round to produce the "definitive SHA-3". By adding more rounds, it takes longer, but by producing lower-grade certifications, the extra time needed to perform a thorough cryptanalysis isn't going to impede those who actually use such functions.

(Yes, it means vendors will need to support more functions. Cry me a river. At the current scale of ICs, you can put one hell of a lot of hash functions onto one chip, and have one hell of a lot of instances of each. Software implementations are just as flexible, with many libraries supporting a huge range. Yes, validating will be more expensive, but it won't take any longer if the implementations are orthogonal, as they won't interact. If you can prove that, then one function or a hundred will take about the same time to validate to accepted standards. If the implementations are correctly designed and documented, then proving the design against the theory and then the implementation against the design should be relatively cheap. It's crappy programming styles that make validation expensive, and if you make crappy programming too expensive for commercial vendors, I can't see there being any problems for anyone other than cheap-minded PHBs - and they deserve to have problems.)

Journal Journal: Beowulf MMORGs 3

Journal by jd on Wednesday September 10, 2008 @03:01PM

Found this interesting site, which is focussing on developing grid computing systems for gaming. The software they seem to be using is a mix of closed and open source.

This could be an important break for Linux, as most of the open source software being written is Linux compatible, and gaming has been the biggest problem area. The ability to play very high-end games - MMORGs, distributed simulators, wide-area FPS, and so on, could transform Linux in the gaming market from being seen as a throwback to the 1980s (as unfair as that is) to being considered world-class.

(Windows machines don't play nearly so nicely with grid computing, so it follows that it will take longer for Microsoft and Microsoft-allied vendors to catch up to the potential. That is time Linux enthusiasts can use to get a head-start and to set the pace.)

The question that interests me is - will they? Will Linux coders use this opportunity of big University research teams and big vendor interest to leapfrog the existing markets completely and go straight for the market after? Or will this be seen as not worth the time, the same way that a lot of potentially exciting projects have petered out (eg: Open Library, Berlin/Fresco, KGI, OpenMOSIX)?

Journal Journal: The Lost Tapes of Delia Derbyshire

Journal by jd on Friday July 18, 2008 @05:20PM

Two hundred and sixty seven tapes of previously unheard electronic music by Delia Derbyshire have been found and are being cataloged.

For those unfamiliar with Delia Derbyshire, she was one of the top pioneers of electronic music in the 1950s and 1960s. One of her best-known pieces was the original theme tune to Doctor Who. According to Wikipedia, "much of the Doctor Who theme was constructed by recording the individual notes from electronic sources one by one onto magnetic tape, cutting the tape with a razor blade to get individual notes on little pieces of tape a few centimetres long and sticking all the pieces of tape back together one by one to make up the tune".

Included in the finds was a piece of dance music recorded in the mid 60s, examined by contemporary artists, revealed that it would be considered better-quality mainstream today. Another piece was incidental music for a production of Hamlet.

The majority of her music mixed wholly electronic sounds, from a sophisticated set of tone generators and modulators, and electronically-altered natural sounds, such as could be made from gourds, lampshades and voices.

Journal Journal: Well, this is irritating. 3

Journal by jd on Thursday May 29, 2008 @05:49PM

Someone has trawled through YouTube and flagged not only the episodes of The Tripods, but also all fan productions, fan cine footage and fan photography of the series. How so, can't you buy it on DVD? Only the first season, the second exists only in pirated form at scifi conventions, and of course the fan material doesn't exist elsewhere at all. The third season, of course, was never made, as the BBC had a frothing xenophobic hatred of science fiction at the time. (So why they made a dalek their general director at about that time, I will never know...)

What makes this exceptionally annoying is that the vast bulk of British scifi has been destroyed by the companies that produced it, the vast bulk of the remainder has never seen the light of day since broadcast, and the vast bulk of what has been released has been either tampered with or damaged in some other way, often (it turns out later) very deliberately, sometimes (again it turns out later) for the purpose of distressing the potential audience.

I've nothing against companies enforcing their rights, but when those companies are acting in a cruel and vindictive fashion towards the audience (such as John Nathan Turner's FUD of audiences being too stupid to know what they like, or too braindead to remember what they have liked), and the audiences vote with their feet, on what possible grounds can it be considered justified for those companies to (a) chain the audience to the ground, and (b) then use the immobility of the audience to rationalize and excuse the abuse by claiming the audience isn't going anywhere?

I put it to the Slashdot Court of Human/Cyborg Rights that scifi fans are entitled to a better, saner, civilized explanation, and that whilst two wrongs can never make a right, one wrong is never better.

Journal Journal: 1nm transistors on graphene

Journal by jd on Thursday April 17, 2008 @07:54PM

Well, it now appears the University of Manchester in England has built 1nm transistors on graphene. The article is short on details, but it appears to be a ring of carbon atoms surrounding a quantum dot, where the quantum dot is not used for quantum computing or quantum states but rather for regulating the electrical properties. This is still a long way from building a practical IC using graphene. It is, however, a critical step forward. The article mentions other bizare behaviours of graphene but does not go into much detail. This is the smallest transistor produced to date.

Journal Journal: No good April Fools jokes this year 8

Journal by jd on Tuesday April 01, 2008 @02:54PM

This has been the dullest April 1st Slashdot has ever had. Has its funny bone been surgically removed? For those who miss the days of OMG!!! PONIES!!!, I think I've found the only site that comes close in being very very strange.....

Journal Journal: Scientific and Academic Open Source - Hotspots, Black Holes

Journal by jd on Monday February 04, 2008 @05:08PM

One of the most fascinating things I've observed in searching for Open Source projects available for whatver I'm doing at the time is the huge disparity of what is available, how it is used and who is interested.

An obvious place to start is in the field of electronics. Computer-based tools are already used to build such stuff, so it's a natural replacement, right? Well, almost. There are tools for handling VHDL, Verilog and SystemC. There are frameworks for simulating both clock-based and asychronous circuits. You can do SPICE simulations, draw circuit diagrams, download existing circuits as starting points or places of inspiration, simulate waveforms, determine coverage and design PCBs. OpenCores provides a lot of fascinating already-generated systems, SUN provides the staggering T1 and T2 UltraSPARC cores, and the Sirocco 64-bit SPARC. This field has probably not got anywhere near what it needs, but it has a lot.

Maths is another obvious area. Plenty of Open Source tools for graphing, higher order logic, theorum provers, linear algebra, eigenvalues, eigenvectors, signal processing, multiple-precision, numerical methods, solvers for all kinds of other specific problem types, etc.

What about astronomy? That requires massive table data crunching, correlation of variations, moving telescopes around with absolute precision - things computers tend to be very good at. There are a few. Programs for capturing images are probably the most common, although some telescopes provide software for controlling telescopes, obtaining data and performing basic operations. Mind you, how much more than this does one need in software? Some things are better done in hardware (for now, at least) because the software hasn't the speed. Yes, the control software seems a little specialized, but it'd be hard to make something like that general-purpose.

Chemistry. Hmmm. Lots of trivial stuff, more educational than valuable - periodic tables, 3D models of molecules, LaTeX formatting aids. There's a fair amount on the study of crystals and crystallography, which is as much chemistry as it is physics, but there's not a lot else. Chemistry involves a lot of tables (which would be ideal for a standardized database), a lot of mathematical equations, formulae, graphing, measuring and correlating all sorts of data, the consequences of different filtering and separation techniques, the wavelength and intensity of energies, analysis of the results of atomic mass spectrometry or other noisy data, etc. I see the underlying tools for doing some (but not all) of these things, but I don't see the heavy lifting.

Archaeology has very few non-trivial tools. Some signal processing for ground-penetrating RADAR, but there are virtually no tools out there that could be useful for helping with interpretation. In fact, most RADAR programs don't interpret either but display the result on a small LCD screen. Nor do any tools exist for correlating interpretations (other than manually via an extremely naive - for this purpose - GIS database). There's a few scraps here and there, but signal analysis and GIS seem to be about it, and those were mostly developed for mining companies and tend to show it.

Biology has plenty of DNA sequencing code. By now, Slashdotter should be able so sequence eith own DNA, not pay someone a thousand to do it. You mean, those aren't enough, that you need more hardware? And a lot more software? It's an important step, but it's not unique.

Mechanical Engineering. I haven't seen anything of any significance.

Geology. Not really, beyond the same software for Archaeology, but using it for find seams in rock.

Psychology: Nada.

Psychiatry: None.

Sports: Lots of software getting used, but little of it is open source.

Result - those who gain with the least to lose and the most to win make the change. Those who feel like there's no benefit from changing what they're doing will continue doing what they're doing. My suggestion? There are gaping holes in Open Source. Fill them in.

Journal Journal: Open Source Archaeology

Journal by jd on Sunday January 13, 2008 @07:52PM

This is an interesting (to me) piece of work that I've been asked to do. Using open-source software to analyze data from both ground-penetrating radar and magnetometers, open-source GIS software for tracking archaeological finds, open-source modeling software to produce archaeologically and technically sound reconstructions, and then use a mix of open-source virtual reality software and open-source web technology to provide both the raw and the visually interpreted data in a form that is of practical use to experts and non-experts alike.

If that sounds like a complex task, it is. The site is extremely convoluted, there is a wealth of data that is currently in a highly unusable form, and what is meaningful to an expert is not necessarily the least bit useful or usable to a non-expert (and vice versa). Currently, there is a lot of skepticism by The Powers That Be that such a project would even be possible. My first task, then, is to produce an example. My impossible mission is to convert the few scraps of information published on medieval aisled halls, along with the very limited archaeological finds from the site in question, into the dual format of raw information and virtual reality.

On the one hand, the limited information means that the first part is relatively easy. An online archaeological GIS-enabled database may not be trivial, but all the software needed can - at least - be found on Freshmeat and the amount of data entry is relatively small. The second part is tougher. Again, open-source VR software does exist, but it is one thing to enter known values that can be verified into a database, it is entirely another to derive values that are implied and logically required but for which there is no direct evidence at all.

There is a catch. Virtual reality is great for producing models you can walk through, but it's generally pretty lousy at telling you if said model violates the laws of physics. Given that I can hardly build my own medieval aisled hall, I know of no other method besides hand-cranking through the numbers for validating the predicted structure. Suggestions would be extremely welcome, as would any idea on how I could either use the open-source approach for the hall design, or how I could use something like BOINC to automate the validation of a virtual landscape.

Technically, this is fun - I'm getting to do some reasonably original work - but original work is necessarily far more demanding in terms of research and application than run-of-the-mill work. Mind you, I only have myself to blame - the archaeologists have been satisfied so far with producing a web-based diary of major finds, plus entering the data on a completely unusable regional database. Such are the hazards of pointing out that you can do better! :)

Journal Journal: Word from an Oregon Senator on software radio 3

Journal by jd on Saturday November 03, 2007 @05:58AM

I received a letter in response to a request by myself to Senator Ron Wyden (Oregon) on the topic of software radios. I pointed out that Open Source is often more secure than closed source, that a ban on open source would be a-priori restraint of trade that would probably be detrimental to the deployment and usefulness of such devices, and that the FCC's position on the matter did not appear to be justified by the facts. I tried to avoid the whole freedom argument, on the grounds that politicians are generally not elected by intellectuals. Over-priced, crippled technology that would probably be made elsewhere... that's an argument politicians can hear better.

(No insult intended to Senator Wyden, he may very well be extremely smart, but since I don't know him, the most logical thing for me to do is to insinuate all the areas that could dent his popularity and fund-raising potential.)

His response is interesting. Firstly, he agreed that Open Source can be more secure. A fair enough position to take, given the level of closed-source IT industry in Oregon, and far more generous than I'd have expected for that same reason.

His second comment - that many in the software industry have made identical - or near-identical - objections was fascinating. Politicians are extremely adept at saying what you want to hear - they have to be, it's their only way to survive in their line of work - but to the extent that IT industry leaders have complained, the Senate is apparently taking notice. They would appear to be aware now of Open Source - for good or bad - and are adjusting their thinking accordingly.

He goes on to say that he is not satisfied that the FCC's claims that closed-source will make the software more secure are correct and that banning open-source may be counter-productive to the FCC's objectives. Again, that's good. Whether he believes it or not, I don't know, but there's clearly enough doubt in his mind as to the wisdom of the FCC's course that he's willing to be in writing in saying that he believes Open Source could make for a more secure product and that the FCC's actions could backfire.

The last part is the part that unnerves me slightly. He says that if legislation comes before the Senate, he will keep my views in mind. He did NOT say he would oppose legislation that would ban Open Source software radios, only that he would keep in mind that I - and others - oppose such a ban. Nor did he say that he would make any effort to bring forward any legislation requiring the FCC to re-examine the issue or explain themselves.

Why is that unnerving? Because although he expresses disquiet, he won't commit himself to any actual action over it. Maybe I'm being too hard on him, but it bothers me intensely that he acknowledges my concerns are widespread in the industry but promises nothing. Not even so much as to ask the FCC why they're being so shirty on the issue. The letter is good, I appreciate his taking the time to, well, ask his secretary to probably print out a standard form letter, but that's not going to achieve results. Why should the FCC care how many form letters have been printed? Well, unless they have shares in the company making the envelopes.

A response that shows some sympathy is better than no response at all, but only if it is accompanied by action. I hope it does. I hope my mail to him made some useful contribution to the debate. I also hope that someday I'll win the lottery. I am curious as to which has the greater odds of success.

Journal Journal: Oh goody. Exactly what I didn't need. 6

Journal by jd on Friday October 26, 2007 @03:39PM

A person I designed an online store for didn't want to pay for it. That happens. They also turned out to be a gun and knife fanatic. No big deal, right? That depends on how you interpret the phrase "you'd better watch your back, if I were you". May this be a lesson to you all - never do software consulting work for a latent psychotic.

Journal Journal: Who uses Freshmeat?

Journal by jd on Saturday October 20, 2007 @07:23PM

One thing that has often puzzled me, when working in places that use Open Source software, is how many people know of Slashdot (I'd say 75% or more read it daily) but how few were even aware Freshmeat existed. The same was true of an announcement service that tracked Open Source and shareware products. Yet those projects I track on Freshmeat (I own something like 150 records and am subscribed to something like twice that) show hundreds - sometimes thousands - of accesses after a new release. If the corporate sector is totally blind to Freshmeat, who is doing the accessing?

Looking at the numbers, I think I can hazard some guesses. Educational and Government places probably rank high in the user charts, as clustering and scientific software are often moderately or highly subscribed and show moderate to high activity after an update. The stats are also skewed towards servers and other administrative or maintenance software, so I'm guessing it's more used by admins than users, which is somewhat foolish as users should be the ones driving updates as they're the ones who know what functions they need and what bugs they experience.

The popularity of MPlayer is an odd one, as most users will get this from their distro and it's unlikely to be used for system maintenance. Nonetheless, it is more popular than any other package, including the Linux kernel. Even the Linux kernel is oddly placed, at second, as this is announced in so many different places, from LWN and Slashdot to the Linux Kernel Mailing List and LinuxHQ. Most software is only ever announced on its homepage and on Freshmeat only if someone has made a record for it and is keeping it up-to-date. The dilution of the Linux kernel announcements is so staggering that it is amazing that a single service would get so much attention.

I guess if we assume a heavy Government/Educational userbase, it's more understandable. Those are going to be places where heavy-duty mailing lists are not going to be an option, and where surfing websites on the off-chance of an announcement would be frowned upon.

If I'm correct, how do we interpret the numbers? The usage won't be a random sample of a complete cross-section of the population, it'll be a self-selecting group with relatively narrow interests that is largely built up from a relatively small segment of the Open Source userbase.

Well, why should we interpret the numbers? That's an easy one. Corporations resist software they consider "unpopular" or "unused", no matter how useful or productive it would be. They are staggeringly blind to reality. If you can produce meaningful usage estimates, and can defend them, it sometimes (not always) weakens resistance to vitally-needed updates and changes. If you can show that some project has been downloaded by tens of thousands of probable competitors, you can be damn sure that project will be on the server by the next morning, come hell or high water.

Some would argue that it doesn't matter - we get paid to do what we're told to do and to make the managers look good. That entire discussion could - and does - fill vast volumes, with no real answer. I've got my own thoughts, but that';s not really this discussion and I'd probably run Slashdot's servers out of disk space if I were to put them all down here.

Here, I am far more interested in knowing why the userbase for any announcement service should be self-limiting. I've seen places be utterly ignorant of what software exists or where it can be found. I've had people ask me how to search for programs or how I know about updates before the distros push the packages out. On the flip-side, as I've already pointed out, there are packages whose records show far greater levels of access than you would expect, given the availability of the same (or better) information elsewhere, sometimes much sooner.

Based on what I've seen, I am going to say that the records for "mission-critical" software and software of specific interest to one of the niches inhabiting Freshmeat will be relatively close to the actual levels of active interest. Passive interest (eg: users of a desktop Linux system are probably not actively interested in new kernel or glibc releases, but still use those updates) is probably a lot higher, but I don't think it's easily calculable. I'm going to guess that the number of people who actually download the source code is somewhere between two and five times the number who visit the site via Freshmeat.

For commercial and industrial software, I'm going to guess that Freshmeat numbers are way too low, that people discover packages by accident or media rumor, or outsource the updates to some group that use a commercial tracking/monitoring service. For this type of software, I'm guessing that the actual number of people impacted by announcements might be anywhere from five to fifty times the number given in the stats. There is no simple way of finding out who knows what, though, because there is nowhere to look.

However, when giving a presentation to managers on why product A is the one to go for, you can't be vague, you can't be hesitant and you absolutely can't be technical. That's why having a bit more certainty would be a good thing. Lacking any means of being certain, though, anyone in that position has to give some number that managers can use. I would take the URL access value from Freshmeat (the number who actually visited the site, not just the record) and scale it by the midpoint of the numbers I've suggested. It's not perfect, but it's almost certainly the best number you are going to be able to get as things stand.

Yeah, yeah, GIGO. But managers don't generally care about GIGO. They care that they have plausible and defendable numbers to work with. That is what they are getting. If you wait to give them something precise and accurate, you'll certainly be waiting until long after any decision has been made, and probably be waiting forever in many cases.

What if you're a home user? Plenty of those exist. Well, to home users, I would argue that updates from distros are typically slow in coming, that library version clashes are far too frequent, that permutations of configuration that may be interesting or useful usually won't be provided, and that even distros that build locally (Gentoo, for example) have massive problems with keeping current and avoiding unnecessary collisions.

If you're not specifically the sort of user served by the distro of your choice, you WILL find yourself building your own binaries, and you would be very strongly advised to be aware of all updates to those packages when they happen.

Journal Journal: Is Social Networking worthwhile? 3

Journal by jd on Tuesday June 12, 2007 @04:00PM

There are plenty of online social networking sites - LinkedIn is the one I'm most familiar with. They seem to be designed around the notion of the Good Old Boys Club, the gentrified country clubs and the stratified societies of the Victorian era, where who you knew mattered more than what you knew.

But are they really so bad? So far, my experience has given me a resounding "maybe". People collect associations the way others collect baseball cards or antiques - to be looked at and prized, but not necessarily valued (prized and valued are not the same thing), and certainly not to be used. But this defeats the idea of social networking, which attempts to break down the walls and raise awareness. Well, that and make a handsome profit in the deal. Nothing wrong with making money, except when it's at the expense of what you are trying to achieve.

So why the "maybe", if my experience thus far has been largely negative? Because it has also been partially positive, and because I know perfectly well that "country club" attitudes can work for those who work them. The catch is that it has to be the right club and the right attitude. That matters, in such mindsets. It matter a lot.

So, I ask the question: Is there an online social networking site that has the "right" stuff?

Journal Journal: Why is wordprocessing so primitive? 12

Journal by jd on Friday May 18, 2007 @09:08PM

This is a serious question. I'm not talking about the complexity of the software, per se - if you stuffed any more macros or features into existing products, they'd undergo gravitational collapse. Rather, I'm talking about the whole notion on which word-processors, desktop publishing packages and even typesetting programs such as TeX are based.

What notion is that? That each and every type of writing is somehow magical, special and so utterly distinct from any other type of writing that special templates, special rules and special fonts are absolutely required.

Of course, anyone who has actually written anything in their entire life - from a grocery list onwards - knows that this is nonsense. We freely mix graphics, characters, designators, determinatives and other markings, from scribblings through to published texts. If word-processing is to have maximum usefulness, it must reflect how we actually process words, not some artificial restraint that reflects hardware limitations that ceased to exist about twenty years ago.

The simplest case is adorning the character with notation above it, below it, or as subscript or superscript to either the left or right. With almost no exceptions, this adornment will consist of one or more symbols that already exist in the font you are using. Having one special symbol for every permutation you feel like using is a waste of resources and limits you to the pitiful handful of permutations programmed in.

The next simplest case is any alphabet derived from the Phoenician alphabet (which includes all the Roman, Cyrillic and even Greek alphabets). So long as the program knows the language you want to work in, the translation rules are trivial. The German esset is merely a character that replaces a double s when typing in that language. A simple lookup table - hardly a difficult problem.

Iconographic and Ideographic languages are just an extension to this. You specify a source language and a destination language, and provided you have such a mapping, one word gets substituted with one symbol. You could leave the text underneath and use it as a collection of filenames for grabbing the images, if you wanted to make it easy to edit and easy to program. As before, you already have all the symbols you're ever likely to want to overlay, so you're not talking about having every possible image in a distinct file. Anything not provided can be synthesized.

Other languages can be more of a bugbear, but only marginally. A historical writing style like Cuneiform requires two sizes of line, two sizes of circle, a wedge shape and a half-moon shape. Everything else is a placement problem and can be handled with a combination of lookup tables, rotations and offsets. Computationally, this is trivial stuff.

If the underlying engine, then, has a concept of overlaying characters with different offsets and scales, rotating characters, using lookup tables on regular expressions, and doing simple substitutions as needed, you have an engine that can do all of the atomic operations needed for word-processing or desktop publishing.

This method has been used countless times in the past, but past computers didn't have the horsepower to do a very good job of it. Word-processing has also been stifled in general by the idea that it's a glorified typewriter and that it operates on the character as the atomic unit. What I'm talking about is a fully compositional system. Each end-result character may be produced by a single source symbol, but that would be entirely by chance, as would any connection between any given source symbol and what would be considered a character by the user.

If it's so good, why isn't it used now? Because it's slow. Composing every single character from fundamental components is not a simple process. Because it's not totally repeatable. Two nominally identical characters could potentially differ, because the floating-point arithmetic used is like that. That's why you don't use equalities much in floating-point arithmetic. Because it puts a crimp on the font market. Most fonts are simple derivatives of a basic font, and the whole idea of composition is that simple derivatives are nothing more than a lookup table or macro.

If it's all that, then why want it? Because it makes writing any ancient or modern alphabet trivial, because you can do more in 20 fonts than you can do on existing systems with 2,000, and because it would bugger up the whole Unicode system which can't correctly handle the systems it is currently trying to represent. (The concept behind Unicode is good, but the implementation is a disaster. It needs replacing, but it won't be until someone has a provably superior method - which is the correct approach. It just means that a superior method needs to be found.)

Journal Journal: In Other News For Nerds

Journal by jd on Tuesday May 08, 2007 @01:16AM

There is a new science/geek website out there, called Null Hypothesis, that covers highly unlikely but totally real science. The headline story at the moment is about the sounds of protein molecules. The BBC's coverage of this attempt to out-geek the geeks reports that there are only 60,000 readers - something like a hundredth of what I believe Slashdot's readership is. Even if nobody actually joins the site, it is our clear moral duty to our fellow nerds (and an interesting science experiment they can report on) to attempt to melt the server under a severe Slashdotting.

Journal Journal: Software announcements (or: how to irritate JD) 3

Journal by jd on Wednesday March 07, 2007 @03:14AM

Yes, back to the grumbling again. I do not enjoy this. If I could write about stuff I liked, I would vastly prefer it. However, that will have to wait until there's stuff I like happening.

This rant has to do with software announcements. I covered this to a degree in a previous journal entry on the secrecy of some open source projects. This time, I will be more concerned with neglect (the known version is truly ancient, compared to the published version), quality (compare the Slashdot description with the Freshmeat one for the same piece of software) and reaction.

Neglect is a big one. I own 113 project records on Freshmeat and have bookmarked an additional 303. Why so many? 303 bookmarks is a lot - can't I just look to see when the project updates are announced? I would, if they ever were. The bookmarks are reminders of correctly-assigned records that the author can't be bothered to maintain. If they get updated at all, it's because I updated them. With the sheer volume of projects involved, you're damn right if I sometimes think some of the bigger Open Source consortia that develop these packages should be paying me for my time. Globus is no small concern, it's a friggin' international collaboration of multinational corporations. Why are they depending on volunteers to take on the unpaid, thankless, tedious task of fixing their neglect?

Ok, what about those 113? How many do you think I actually created? I'd say maybe half, the others I picked up usually because the owner no longer existed. In a few cases, the records were so stale and decayed that the last update predated the owner field, yet the software has been continuing just fine. Again, that's not good. At best, it means that inaccuracies or other reports will fail - nobody to send to.

Freshmeat is not the only software inventory out there, although it's the only one I make any effort to assist. I've assisted a few paid sites as a consultant, and frankly the stagnation there was even worse. It would be so easy to spend every waking moment just bringing these databases up to speed. We're talking extreme neglect not in the hundreds of records but in the tens of thousands. These are professional sites, paid by customers who want accurate information. They aren't getting it. What they get is something that could be anywhere from a few days to a few years behind reality. Frankly, those customers would be infinitely better off buying a giant disk array and using Harvest to index every site that Google reports has at least one page with the word "download" on it. It would work out cheaper very quickly, and you can be sure of how fresh the information is.

Ok, what about quality? If there's no freshness, then quality is automatically suspect, as projects are evolving entities. They're not fixed for all time, except in rare cases. Ignoring that, though, how accurate are announcements as a rule? Not very. The quality of information is generally fairly poor - sometimes because the person providing it doesn't really understand what is being communicated ("Chinese Whispers") and sometimes because the information simply doesn't exist and has to be inferred from the meager clues that have been left. Sherlock Holmes may be a great detective, but he is also a work of fiction. And if anyone did have those skills, do you think they'd be spending them on correcting project records? Where it's good, it can be truly excellent, but since it would also take someone of the power of Holmes to tell you when the information is good, it's not that useful. If the only way to tell is if you already know the answer, you have no need to be able to tell.

What about reaction? Well, let me put it this way. Atlas' official version is 3.7.29. Fedora Core 7 beta 1 uses version 3.6.0. The official version of Geant is 8.2 patch-level 1, but Fedora Core 7 beta 1 uses version 3.21. I've done some experiments with my own Open Source projects and have found that updates and patches follow the laws of Brownian motion. It is simply not possible to predict if/when/how updates will ever occur within a single distribution, but across all variations of all distributions, the net rate of pickup and refinement is more-or-less constant. This is, of course, completely useless to most users - even those with subatomic vector plotters.

Overall, it's a nightmare to find what you want, a bigger nightmare to determine if it is actually what you want, and a total and utter diabolical nightmare from the 666th plane of hell to determine if what is actually available in any way reflects what it was that you thought you were getting.

Slashdot Top Deals