Gripe, gripe, gripe ... again!

Journal algebraist's Journal: Gripe, gripe, gripe ... again!

Journal by algebraist on Monday April 07, 2003 @09:00PM

The May 2003 issue of Dr Dobb's Journal contains an article ambitiously titled "XML & Relational Databases". In it, two software design engineers discuss the problem of viewing "traditional relational data as XML". That sounds interesting enough but, alas and in (obviously) my opinion, the article rapidly deteriorates into a rant into why "there are ... significant benefits in providing an XML abstraction over a relational database". It is a rant because there is no comparable listing provided of the benefits of a relational database over XML, although the authors do say

relational databases are excellent persistent storage mechanisms for storing highly structured, normalized data--"square data".

If they left it at that, it would be nearly good enough. Unfortunately, they go on:

But as anyone who has modeled real-world busines objects knows, the real world is not square. For instance, consider a typical customer list where there is some variability between customers. One customer has a cell phone, work phone, and home phone; while another has a cell phone and pager number. A typical first attempt at a representation with relational tables uses a column for each property; see Table 1(a). However this approach leads to sparse tables with highly denormalized data and can potentially cause performance and scalability problems for a typical relational database. The solution for most databases is to pull out properties and place them in their own normalized table that has an N:1 relationship with the master table, as in Table 1(b). It would then be possible to further normalize the curstomer data bvy separating the PhoneNumbers table into separate tables based on type, like Table 2.

The results would be a highly normalized view of customers that would require users to join several tables to get the desired Customer objects. For most experienced DBAs and SQL users, this probably wouldn't be a problem. However, for most application developers, it would probably be easier to program against a logical Customer object without exposing the details of the underlying relational storage mechanism.

There is little wrong with the author's decomposition. How and why such a re-representation or normalization is done seems completely misunderstood. The author's do retain the relational form of the source data, but they go on to argue why the "XML abstraction" is superior. They also do not address their own complaint and complain that XML is somehow better for "real world", non-"square" data, data which is just far more prevalent than the "normal" stuff you sometimes find lying around. **sigh**

There are basically three problems with the paper, again, of course, in my opinion.

First, relational databases are treated as if they are branded software widgets of some sort. Now, that is true if you are dealing with particular approximations, such as Oracle, IBM, Sybase, or MS Access. And, given some of the advertising from these vendors, I can understand how people might confuse these brands with the database relational model itself, since a strong case can be made vendors have done such a misreable job realizing it. Nevertheless, the relational model and a truly relational database offers far more than a some-fangled widget for doing persistent storage that mere mortals are better off not knowing about. It provides a basic technique for analyzing and structuring information so that the resulting structure solves what in artificial intelligence work used to be called the McCarthy frame problem, albeit limited to recurring sets of scalar data. That problem is the one of determining what remains unchanged as a consequence of an action or event. The normalization process organizes information in a way that when something needs to be updated it only has to be updated in exactly one place, whether that "update" is changing the value of some attribute or, for that matter, expanding an attribute to provide greater nuances of description.

Second, the presentation of relational databases is entirely devoid of relational theory. This is not a question of style or academic form. It is a question of being able to apply logic and to compare same-kinds with same-kinds. The theory of relational data when done properly supports facilities and feats noone expects of "object-oriented" or off-the-shelf software systems. My point is, people should expect these things. Consider the following, taken from The Database Relational Model: A Retrospective Review and Analysis by C.J.Date (page 15):

The language would provide symmetric exploitation. That is, the user would be able to access a given relation using any combination of its attributes as knowns and the remaining ones as unknowns. "This is a system feature [that is] missing from many current information systems." Quite so!--but of course we take it as a sine qua non now, at least in the relational world (the object world doesn't seem to think it's so important for some reason).

Third, the "XML & Relational Databases" article piles on yet another example of reducing what should be an engineering argument and engineering decision to the cheering on of a favored (sports) team, surrendering to the false but disturbingly common notion that all decisions and argument about software are basically subjective and best decided by who wins the content in the marketplace. To the degree to which people in the software business believe this, the industry is doomed to a recurrent faddism and ultimately its financial dissolution. And, as I have argued elsewhere--notably in a letter to the editor of Dr Dobbs Journal itself (page 10, DDJ, November 2002)--and in direct connection to the matter of XML:

The fact is that XML is in serious danger of succumbing to the "New Kid in Town" mass psychosis that sweeps promising software technologies into town, and then out again, without slowing down to say "Hi." Somehow, we all can't seem to resist the temptation to hype the technology, hoping to pry free some venture capital, supportive funding, or maybe just a few purchase orders. Then, when the rubber doesn't meet the road, it's dumped. Fast.

XML's success was always predicated upon industry groups defining semantic--meaning as implemented in procedures, mostly software--interpretations for XML utterances. The idea, actually reasonable, is that each niche group would convene and hammer out standard tags usable for the niche interest. They'd publish, everyone would sign on, and we'd have Capitalist Heaven.

XML and relational databases have some kind of relationship but without these interpretations for XML utterances, that relationship is simply syntactic. XML has no theory inherent in it.

This discussion has been archived. No new comments can be posted.