Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
PC Games (Games)

Journal Chacham's Journal: Verbiage: SQL ANSI Join syntax is just plain wrong. 5

It's enough to make one cry. Why did they have to change the one nearly perfect language. The one that made sense. The one with nearly no extra words. The one where the tricks are logical and not syntactical. The one that everyone likes once they learn to think. It's SQL, as i'm sure you guessed.

Everything in SQL is required. Except perhaps, the word "INTO" after an "INSERT" which is there only for readability. All the words are readable too, and once the concepts are understood, the wording of the keywords make perfect sense.

The hardest thing to understand is GROUP BY, but that's because books forget to explain that aggregate functions use the entire data set as a group, unless GROUP BY explicitly splits it up into smaller groups, in which case the aggregates work on each sub-group.

Then there's the WHERE clause. Misunderstood only by those who use SQL to retrieve data quickly, and not to understand it. It's simple. A query returns every single record unless a WHERE clause inhibits it. All joins are Cartesian joins without a WHERE clause specifying what to limit.

There's also no overlapping; no clause does what any other clause does. Even though WHERE and HAVING seem to overlap, that's only on the results seen, not on how it actually is done. HAVING is for groups, WHERE is for single records. HAVING is processed after the GROUPs are split but before the groups are worked upon. WHERE is processed after everything is worked on but before the sorter gets it.

Then people wanted outer joins. Outer joins make no sense set-wise, or even retrieval-wise. You cannot select a record that isn't there. It is the equivalent of adding a record of all NULLs into a dataset, and saying to the record (TABLE looking for the match), "if you don't find a match, match the NULL value." Regardless of if it made sense, people wanted it. So, solutions were implemented. Oracle implemented adding a (+) at the end of the column in there WHERE clause that was to return that NULL. IOW, making the column a column *plus* a default NULL value. OK, that makes sense. SQL Server used *= in the WHERE clause to say that the equivalence is not quite that. That it'll match even if nothing is found. Two sides of the same coin, only different on perspective. Oracle's solution--as it always is-- is based on the data side. SQL Server's solution--as it always is--is based on the retrieval side.

Then came along ANSI. They implemented this solution that put a new JOIN operation in the FROM clause itself. This way, in the FROM clause itself the TABLEs could be seen as being JOINed, and not required any WHERE clauses.

This is very wrong:

  • The difference between the FROM clause and the WHERE clause is: the FROM clause specifies the data source and the WHERE clause limits output. This solution mangles that by adding a limiter into the FROM clause itself.
  • There actually is no such thing as a join. Nor should there be. For each data source open, a separate CURSOR is opened. With three TABLEs, the process is:
                                                   
    • go to first record of first TABLE
             
    • go to first record of second TABLE
             
    • go to first record of third table
             
    • cycle through all records in third table
             
    • go to second record of second table
             
    • cycle through all records in third table
             
    • go to third record of second table
             
    • cycle through all records in third table
             
    • repeat for all record in second table
             
    • go to second record of first TABLE
             
    • repeat process for all records in first TABLE

    Without a WHERE clause this results in a Cartesian join, which is why the WHERE clause limits the output, nixing most records when they do not match. The result is that it keeps what does match, which looks like a join.

    By moving this operation into the FROM clause, it specifies that a JOIN is recognized by the system.

  • With the normal syntax it is very easy to see what TABLEs are being used in the query. With the JOIN syntax the clauses have to be traversed to see them all.
  • WHERE clauses can be neatly ordered and separated into each TABLE's clauses. Making for an easy read, and seeing what was joined where is pretty easy.

    Putting joins into the FROM clause, however, makes it difficult to see which TABLE is joined with which, and when more than two TABLEs are used, is is nearly impossible to figure out what is going on.

All in all, the ANSI SQL syntax breaks that SQL paradigm and adds complexity to a very simple system. But, it's been accepted by all now and is the required use on some systems.

It's too bad. I thought it had a chance.

This discussion has been archived. No new comments can be posted.

Verbiage: SQL ANSI Join syntax is just plain wrong.

Comments Filter:
  • Even though it's less efficient, the syntax fits in better with what I know of SQL. My immediate supervisor is the opposite though- and I must admit his Outer Join routines run faster. He always uses real equivalence though in SQL Server- testing equal to an IsNull Function to be sure he's not getting the records that have nulls.
    • Most db systems have optimized the "outer join" a great deal, which is why his queries run faster. Outer joins don't really exist, however, strictly speaking (think about it: how are you going to create relationships - "join" - using data that is specifically defined as being unrelated to the request?), and the correct way to perform the retrieval is with subqueries, as you are doing.

      Of course, subqueries are not very optimized, and can't rely on magical concepts of nonexistance like NULL.

      This is one of man
    • Heh.

      Truthfully, i'm on UDB right now though. With a little support to Sybase when they need it.
  • As a long time Oracle developer, it took me a while to come around on this...

    The problem with handling outer joins through traditional syntax is that it can be difficult to express outer joins between subsets without resorting to truly confusing logic or resorting to even more confusing inline views. For instance...

    SELECT m.col1, m.col2, d.col3
    FROM MASTER m, DETAIL d
    WHERE m.col1 = d.col1 (+)
    AND d.col2 = 'foo'

    This is a common error. The predicate "d.col2 = 'foo'" overrides the outer join. The mor

    • Well, it should be a slightly confusing, well not that it actually should, but it is understandable, being OUTER JOINs are not correct in themselves, and therefore implemented in strange ways.

      Being the (+) makes the COLUMN a COLUMN plus a default NULL value, everything makes perfect sense. Though, you are correct that the code is wordy.

      Although the ANSI syntax is slightly cleaner in this one case, i find it to be more confusing overall. Especially when more than one TABLE is used, the JOIN syntax goes crazy

"Here's something to think about: How come you never see a headline like `Psychic Wins Lottery.'" -- Comedian Jay Leno

Working...