Relational Database - What does that Mean?

Journal jgardn's Journal: Relational Database - What does that Mean?

Journal by jgardn on Thursday September 09, 2004 @12:00PM

I posted a comment that said that humans think more like a relational database than a hierchical one. (http://slashdot.org/comments.pl?sid=120762&cid=10169159). I think enough people don't understand what a database is, nonetheless a hierchical or relational one.

A database is a store of information. Everything you put some data in and then retrieve later on is a database. Your filesystem is a database. The internet is a database. Even memory can be considered database.

The easiest way to keep track of the data is merely to retain a pointer to it. That is how we manage data stored in a memory database. The ext2 filesystem has something called an "inode" which basically is a pointer to a specific file stored on the disk. We use URLs on the internet to reference a particular piece of data.

But how do you find the data when you don't remember the exact pointer? And what if the pointer to the information has changed since the last time you accessed it or stored it?

Enter the hierchical database. In the hierchical database, there is a root node. The root node points to several other nodes, and so on and so forth, until every node is pointed to. This is an awful lot like some scripting languages. It is also how data is stored in a traditional filesystem. Notice how using these databases is quite difficult. The pointer to the node in question is now a path rather than a single bit of information. The paths can get quite complicated.

In short, hierchical databases really suck. They are not natural because we don't store data hierchically. Quick, what is the parent node of "the picnic last summer where you played baseball"? You don't know because you don't store information like a hierchical database.

Enter the relational database. In a relational database, data is stored and referenced by its relations with other data. Hierchical databases are a subset of relational databases. That is, you can store data in a hierchy in a relational database. But a relational database can do more than a hierchical database.

The types of relations possible are limited to your imagination. Typically, in real relational databases that are well-designed, data has three or more relations to other bits of data. The number of relations can reach into the hundreds, yet our minds can easily keep track of it. For instance, how do we map products to people? I can think of three different ways immediately. I see the products sold by a particular salesman, who is a type of person. I see products bought by a customer. I see products bought by the store from a distributor, who has a representative who is a person. There are more you can think of.

So, using the example above, how does you mind store "the picnic last summer where you played baseball"? It stores it in a number of different ways. First, it goes under the topic picnic. It relates to all other picnics. Then it goes under the topic of last year, and all the activities you undertook that last year. It is also filed under summertime, which includes all the activities you ever undertook during summer. Don't forget the relation to basebal games you played.

If you wanted to look that particular event up, you would search all your relations for the one event that matched "picnic", "last year", "summertime", "baseball". Even only using two of these four criteria, you would probably get a set of events that is small enough that you can identify which one you were looking for easily.

How does this relate to filesystems? You files should be stored relationally rather than hierchically. Rather, it should be stored relationally in addition to hierchically. You letter to grandma that you wrote last week should be filed as "the letter called "/home/jgardn/letter/grandma-2004-09-01.doc". It should also be filed under "letters", "grandma", and "stuff I did last week". These in turn will be related to other data. "letters" are "documents". "grandma" is "family" and maybe "friend" depending on your relation. "last week" is part of "this month", and part of "september" in the "year 2004". Again, searching for the letter with just two or three of the above criteria should give you a small enough set that finding the letter is trivial.

Now, a final point. Since relational databases are so similar to how we store data in our minds, performance is actually increased, even though the actual implementation is slower. This is because the act of finding a letter doesn't include just "loading the letter from disk into memory", but also "finding the letter in the first place, and identifying it as the one we want to see correctly." The second task is far more efficient in a relational database filesystem, and happens to also take the most time in any filesystem.

Journal jgardn's Journal: Relational Database - What does that Mean?

Slashdot Top Deals

Slashdot