Verbiage: SQL ANSI Join syntax is just plain wrong.

Journal Chacham's Journal: Verbiage: SQL ANSI Join syntax is just plain wrong. 5

Journal by Chacham on Thursday May 11, 2006 @11:55AM

It's enough to make one cry. Why did they have to change the one nearly perfect language. The one that made sense. The one with nearly no extra words. The one where the tricks are logical and not syntactical. The one that everyone likes once they learn to think. It's SQL, as i'm sure you guessed.

Everything in SQL is required. Except perhaps, the word "INTO" after an "INSERT" which is there only for readability. All the words are readable too, and once the concepts are understood, the wording of the keywords make perfect sense.

The hardest thing to understand is GROUP BY, but that's because books forget to explain that aggregate functions use the entire data set as a group, unless GROUP BY explicitly splits it up into smaller groups, in which case the aggregates work on each sub-group.

Then there's the WHERE clause. Misunderstood only by those who use SQL to retrieve data quickly, and not to understand it. It's simple. A query returns every single record unless a WHERE clause inhibits it. All joins are Cartesian joins without a WHERE clause specifying what to limit.

There's also no overlapping; no clause does what any other clause does. Even though WHERE and HAVING seem to overlap, that's only on the results seen, not on how it actually is done. HAVING is for groups, WHERE is for single records. HAVING is processed after the GROUPs are split but before the groups are worked upon. WHERE is processed after everything is worked on but before the sorter gets it.

Then people wanted outer joins. Outer joins make no sense set-wise, or even retrieval-wise. You cannot select a record that isn't there. It is the equivalent of adding a record of all NULLs into a dataset, and saying to the record (TABLE looking for the match), "if you don't find a match, match the NULL value." Regardless of if it made sense, people wanted it. So, solutions were implemented. Oracle implemented adding a (+) at the end of the column in there WHERE clause that was to return that NULL. IOW, making the column a column *plus* a default NULL value. OK, that makes sense. SQL Server used *= in the WHERE clause to say that the equivalence is not quite that. That it'll match even if nothing is found. Two sides of the same coin, only different on perspective. Oracle's solution--as it always is-- is based on the data side. SQL Server's solution--as it always is--is based on the retrieval side.

Then came along ANSI. They implemented this solution that put a new JOIN operation in the FROM clause itself. This way, in the FROM clause itself the TABLEs could be seen as being JOINed, and not required any WHERE clauses.

This is very wrong:

The difference between the FROM clause and the WHERE clause is: the FROM clause specifies the data source and the WHERE clause limits output. This solution mangles that by adding a limiter into the FROM clause itself.
There actually is no such thing as a join. Nor should there be. For each data source open, a separate CURSOR is opened. With three TABLEs, the process is:
- go to first record of first TABLE
- go to first record of second TABLE
- go to first record of third table
- cycle through all records in third table
- go to second record of second table
- cycle through all records in third table
- go to third record of second table
- cycle through all records in third table
- repeat for all record in second table
- go to second record of first TABLE
- repeat process for all records in first TABLE
Without a WHERE clause this results in a Cartesian join, which is why the WHERE clause limits the output, nixing most records when they do not match. The result is that it keeps what does match, which looks like a join.
By moving this operation into the FROM clause, it specifies that a JOIN is recognized by the system.
With the normal syntax it is very easy to see what TABLEs are being used in the query. With the JOIN syntax the clauses have to be traversed to see them all.
WHERE clauses can be neatly ordered and separated into each TABLE's clauses. Making for an easy read, and seeing what was joined where is pretty easy.
Putting joins into the FROM clause, however, makes it difficult to see which TABLE is joined with which, and when more than two TABLEs are used, is is nearly impossible to figure out what is going on.

All in all, the ANSI SQL syntax breaks that SQL paradigm and adds complexity to a very simple system. But, it's been accepted by all now and is the required use on some systems.

It's too bad. I thought it had a chance.

This discussion has been archived. No new comments can be posted.