Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×

The Art of SQL 225

Graeme Williams writes "One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does. The Art of SQL is the opposite of a cookbook – or rather it's about cooking rather than recipes. It's not a reference manual, although there's plenty to refer back to. It's an intermediate level book which assumes you know how to read and write SQL, and analyzes what SQL does and how it does it." Read on for Graeme's review.
The Art of SQL
author Stéphane Faroult with Peter Robson
pages xvi + 349
publisher O'Reilly Media
rating 9
reviewer Graeme Williams
ISBN 0-596-00894-5
summary An excellent way to improve your approach to SQL

I guess it's normal for an intermediate text to present a number of serious examples, the idea being that the code from an example can be applied to roughly similar problems with roughly similar solutions. I think Faroult's goal is both more abstract and more ambitious. He wants to expand your ability to navigate among and analyze alternative SQL statements with more confidence and over a larger range. This isn't so much a book about SQL as it is about thinking about SQL.

There's almost no chance that the SQL examples in the book will be directly applied to a real problem. The examples are relevant at one remove: What does thinking about this example tell me about thinking about my current problem? So the book doesn't come with downloadable samples. There's no point.

The first few chapters of the book lay a foundation for the rest. As each brick in this foundation is placed, it sometimes feels as though it's placed firmly on your head. Think about indexes ... whack! Think about join conditions ... whack! These chapters have very few examples – the goal is to force you to think through queries from first principles. It's more effective (and less painful) than it sounds.

These introductory chapters cover how a query is constructed and executed, including how a query optimizer uses the information which is available to it. Faroult discusses the costs and benefits of indexes, and the interaction of physical layout with indexes, grouping, row ordering and partitioning. He also explains the difference between a purely relational query and one with non-relational parts, and how such a query can be analyzed in layers. Chapter 4 is available on the book's web page. It will give you a good idea of the style of the book, but not of the level of SQL discussed – the longest example in the chapter is just 15 lines.

Chapter 6 presents and analyzes nine SQL patterns, from small result sets taken from a few tables, to large result sets taken from many tables. The chapter falls roughly in the middle of the book, and feels like its heart. Prior chapters have built up to this one, and subsequent chapters are elaborations on particular topics. The theme of the book, to the extent that it has one, is that details matter. Different SQL statements can be used to produce the same result, but their performance will be different depending on details of the data and database. A change to the database structure, such as adding an index, might improve performance in one set of circumstances, but make it worse in another. The case analysis in this chapter will make you more sensitive to details in query design and execution.

The authors almost never mention particular database products. Their justification is that any absolute statement would be invalidated by the next release, or even a different hardware configuration, and anyway, that's not the business they're in. But sometimes this can go too far. The phrase "A clever optimizer ... will be able to" is too hypothetical by half. Is this an existing hypothetical query optimizer, or a vision of a future optimizer? Or the optimizer of one hypothetical database product and not of another? I suspect that Faroult knows and is simply being coy. It's just unhelpful not to tell us what existing databases will do, even if depends on the release or the hardware.

Faroult does this because he's not much interested in telling you what actually happens when a particular SQL statement is executed by a particular database. If the authors wanted a cute title for the book, I'm surprised they passed over The Zen of SQL Maintenance. When you look at an SQL statement, Faroult wants you to see what other SQL statements would do under other circumstances. He literally wants you to see the possibilities.

The second half of the book continues the analysis of chapter 6 into special cases, such as OLAP and large volumes of data, monitoring and resolving performance issues, and debugging problematic SQL.

Chapter 7 discusses tree-structured data, like an employee table with a column for the employee's manager. Faroult likes his own solution best, but presents an alternative approach by Joe Celko clearly enough for you to explore that as well.

Chapter 8 includes a series of examples of SQL and PHP. For anyone like me who spends more time in various programming languages than in SQL, this chapter is a small gem. It nicely illuminates the care needed in deciding what happens in code and what happens in SQL.

Chapter 9 addresses locking and concurrency, as it applies to both physical and logical parallelism. Transactions are included, but the discussion is just one part of a 20-page chapter and seems thin.

The Art of SQL is very clearly written. Whether it is "easy" will depend on how comfortable you are with SQL. This book is targeted at (page xi) "developers with significant (one year or, preferably, more) experience of development with an SQL database", their managers and software architects. I have months of experience spread over a decade or more, so I'm nominally outside the target audience. I found the SQL examples and discussion clear once I had a chance to let them sink in. If you're working with SQL regularly, they'll be perfectly clear.

The graphs let down the otherwise high quality of the book. For example, Figure 5-3 shows a rate (higher is better) but the legend says "Relative cost" (higher is worse). Figures 9-1 through 9-3 on facing pages 228 and 229 show response time histograms for three different query rates but don't show what the rates are. The x-axis of Figure 10-1 seems to be calendar time, but it's decorated with a stop watch icon. And as a representative of rapidly aging boomers with rapidly deteriorating eyesight, could I beg book designers not to put figure legends in a smaller font than the text of the book? Diagrams should be simple and clear, not something to puzzle over.

This is a book to conjure with, but it's not a book for everyone. Some people may find it too abstract, with too much discussion of too few examples. If you're completely new to SQL, the book will be hard going. If you have very many years of experience with SQL, it's just possible that you won't find anything new in the book, although I expect you'll find a lot to think about. For anyone in between, The Art of SQL is a excellent way to improve the way you attack problems in database and query design.

You can purchase The Art of SQL from Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

The Art of SQL

Comments Filter:
  • sql vs. procedural (Score:2, Interesting)

    by JCOTTON ( 775912 ) on Wednesday June 07, 2006 @04:12PM (#15489816) Homepage Journal
    Now that the subject has been raised, my 2 cents.
    I have found (and who can disagree (just trolling)) that at least half of the production databases that I have come across hare not normalized. Go figgure.

    Anyway, this being the case, I have found that SQL is poor in handling a non-normalized table/database. (cant really call a non-normalized table as a database can we? (nuther troll))

    For example. We keep a complete record for each person for each pay period. Even inactives.

    I am asked to give a list of all active employees for a date range, and a lot of payroll detail, personal detail, etc. Guess what? Simple SQL gives a lot of duplicate names. I wish that there was a simple way filter. (Yes, I can do this in sql, but my point is that it is not handled natively in sql. I would like a simple command - give me all names and all their data for the latest pay period - something like that.

    All procedural languages will handle this problem nicely.

    metaphors be with you

  • by mcmonkey ( 96054 ) on Wednesday June 07, 2006 @04:39PM (#15490029) Homepage
    And who modded that comment Insightful? Come on folks, it's garbage.
    Point is nothing is really transferable
    So they seasoned pro with 15-years relational database experience who hasn't worked with new SQL-X is just as good as someone with no db experience on any platform? And when SQL-X 2.0 comes out, all those so-called experts, with their knowledge of SQL-X 1.9 are going to be useless, right? After all, "nothing is really transferable."

    EVERYTHING is transferable. That is, everything you've actually learned, everything you understand. If you're just mashing buttons, yeah, you might be a little lost when the buttons change. When telephones changed from rotary dial to push buttons, some people were still able to make calls. If course the basic syntax changed, and knuckle-draggers like the folks who modded the parent comment Insightful were SOL. But most folks who had some ideas about the THEORY of the telephone--that the little spinning disk on the phone didn't make the actual call but rather transferred information, and the buttons were just a new way of transferring the same information--adapted and moved on.

    The fact that a computer even let such a concept be typed and communicated gives me hope for the day when machines rule the Earth, that they just might have enough of a sense of humor, or pity, to allow us humans to remain in their midst.

  • by Pfhreakaz0id ( 82141 ) on Wednesday June 07, 2006 @04:40PM (#15490035)
    and as a developer, I will counter, that, in my experience, the reason there was a 5-page sql query in the first place was because the DBA had very restrictive rules about normalization (not realizing that denormalization is a common, industry-standard way to improve database performance). Why did my query (this was at my last job, which had the dba's from hell) have 16 joins in it? Because I had to split stuff up across 6 tables that I knew I would always be accessing at the same time anyway...
  • by grrrgrrr ( 945173 ) on Wednesday June 07, 2006 @05:07PM (#15490236)
    For theory books I think you may do better with a book that has litle to do with sql. I find Database in Depth by C.J. Date a nice theory book
  • by cruachan ( 113813 ) on Wednesday June 07, 2006 @05:22PM (#15490338)
    Firstly most production databases contain some denormalization. Indeed the art of designing a real database is knowing where and when to denormalize data. How much denormalization is required is dependent upon the database, access paths and application usage and is rarely more than a few fields or a table or two. Nevertheless real production databases that have been correctly denormalized often run orders of magnitude faster than those that rigidly stick to 3rd normalization throughout.

    Secondly what you are asking for is generally straightforward in any real dialect of SQL. Select distinct works fine, as do various scenarios with subselects and group by / having clauses (having is the most overlooked of the standard SQL clauses and it's use generally signifies you are using code written by someone who knows what they are doing).

    However if you have a good dbms to hand that implements user defined functions then usually the best way if to create a function that returns the uid of the record from the multiple recordset you require (i.e. last payroll record for employee x) and use that in the where clause.

    OTOH if you are stuck with MySQL then the first step you have to take is upgrade to Postgres :-)
  • by cerelib ( 903469 ) on Wednesday June 07, 2006 @05:30PM (#15490381)
    If you are going to go with the cooking/food reference then I do not think SQL is like cooking. SQL is like ordering at a restuarant, where the restuarant is your DBMS. It's like programming in prolog. You don't tell prolog what to do; you tell it what you want. Just a thought.
  • Re:SQL fun (Score:4, Interesting)

    by cruachan ( 113813 ) on Wednesday June 07, 2006 @05:33PM (#15490405)
    Don't think I completely agree. True writing SQL to second-guess the optimizer in detail is deadly and pointless with modern rdbms' anyway (but Oracle 5, where you really had to isn't that many years ago). Nevertheless having a feel for how optimizers work is good. For instance setting up your joins on indexed fields or being aware of where the optimizer will use a full table scan and when that is a problem. On of my favourite tricks for example is to use an index to avoid a table access - which can pay mega dividends on large datasets. For example suppose we have a table which contains employee data and is index on an ID. I know that I regularly require a further field from this table - say insurance number. By setting an index on ID and Insurance Number the optimizer saves a record access for each instance when Insurance Number must be retrieved. That's a simple example, but the theme can be extended quite significantly

Why won't sharks eat lawyers? Professional courtesy.