The Art of SQL

The Art of SQL 225

Posted by samzenpus on Wednesday June 07, 2006 @03:22PM from the learn-all-about-it dept.

Graeme Williams writes "One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does. The Art of SQL is the opposite of a cookbook – or rather it's about cooking rather than recipes. It's not a reference manual, although there's plenty to refer back to. It's an intermediate level book which assumes you know how to read and write SQL, and analyzes what SQL does and how it does it." Read on for Graeme's review.

The Art of SQL
author	Stéphane Faroult with Peter Robson
pages	xvi + 349
publisher	O'Reilly Media
rating	9
reviewer	Graeme Williams
ISBN	0-596-00894-5
summary	An excellent way to improve your approach to SQL

I guess it's normal for an intermediate text to present a number of serious examples, the idea being that the code from an example can be applied to roughly similar problems with roughly similar solutions. I think Faroult's goal is both more abstract and more ambitious. He wants to expand your ability to navigate among and analyze alternative SQL statements with more confidence and over a larger range. This isn't so much a book about SQL as it is about thinking about SQL.

There's almost no chance that the SQL examples in the book will be directly applied to a real problem. The examples are relevant at one remove: What does thinking about this example tell me about thinking about my current problem? So the book doesn't come with downloadable samples. There's no point.

The first few chapters of the book lay a foundation for the rest. As each brick in this foundation is placed, it sometimes feels as though it's placed firmly on your head. Think about indexes ... whack! Think about join conditions ... whack! These chapters have very few examples – the goal is to force you to think through queries from first principles. It's more effective (and less painful) than it sounds.

These introductory chapters cover how a query is constructed and executed, including how a query optimizer uses the information which is available to it. Faroult discusses the costs and benefits of indexes, and the interaction of physical layout with indexes, grouping, row ordering and partitioning. He also explains the difference between a purely relational query and one with non-relational parts, and how such a query can be analyzed in layers. Chapter 4 is available on the book's web page. It will give you a good idea of the style of the book, but not of the level of SQL discussed – the longest example in the chapter is just 15 lines.

Chapter 6 presents and analyzes nine SQL patterns, from small result sets taken from a few tables, to large result sets taken from many tables. The chapter falls roughly in the middle of the book, and feels like its heart. Prior chapters have built up to this one, and subsequent chapters are elaborations on particular topics. The theme of the book, to the extent that it has one, is that details matter. Different SQL statements can be used to produce the same result, but their performance will be different depending on details of the data and database. A change to the database structure, such as adding an index, might improve performance in one set of circumstances, but make it worse in another. The case analysis in this chapter will make you more sensitive to details in query design and execution.

The authors almost never mention particular database products. Their justification is that any absolute statement would be invalidated by the next release, or even a different hardware configuration, and anyway, that's not the business they're in. But sometimes this can go too far. The phrase "A clever optimizer ... will be able to" is too hypothetical by half. Is this an existing hypothetical query optimizer, or a vision of a future optimizer? Or the optimizer of one hypothetical database product and not of another? I suspect that Faroult knows and is simply being coy. It's just unhelpful not to tell us what existing databases will do, even if depends on the release or the hardware.

Faroult does this because he's not much interested in telling you what actually happens when a particular SQL statement is executed by a particular database. If the authors wanted a cute title for the book, I'm surprised they passed over The Zen of SQL Maintenance. When you look at an SQL statement, Faroult wants you to see what other SQL statements would do under other circumstances. He literally wants you to see the possibilities.

The second half of the book continues the analysis of chapter 6 into special cases, such as OLAP and large volumes of data, monitoring and resolving performance issues, and debugging problematic SQL.

Chapter 7 discusses tree-structured data, like an employee table with a column for the employee's manager. Faroult likes his own solution best, but presents an alternative approach by Joe Celko clearly enough for you to explore that as well.

Chapter 8 includes a series of examples of SQL and PHP. For anyone like me who spends more time in various programming languages than in SQL, this chapter is a small gem. It nicely illuminates the care needed in deciding what happens in code and what happens in SQL.

Chapter 9 addresses locking and concurrency, as it applies to both physical and logical parallelism. Transactions are included, but the discussion is just one part of a 20-page chapter and seems thin.

The Art of SQL is very clearly written. Whether it is "easy" will depend on how comfortable you are with SQL. This book is targeted at (page xi) "developers with significant (one year or, preferably, more) experience of development with an SQL database", their managers and software architects. I have months of experience spread over a decade or more, so I'm nominally outside the target audience. I found the SQL examples and discussion clear once I had a chance to let them sink in. If you're working with SQL regularly, they'll be perfectly clear.

The graphs let down the otherwise high quality of the book. For example, Figure 5-3 shows a rate (higher is better) but the legend says "Relative cost" (higher is worse). Figures 9-1 through 9-3 on facing pages 228 and 229 show response time histograms for three different query rates but don't show what the rates are. The x-axis of Figure 10-1 seems to be calendar time, but it's decorated with a stop watch icon. And as a representative of rapidly aging boomers with rapidly deteriorating eyesight, could I beg book designers not to put figure legends in a smaller font than the text of the book? Diagrams should be simple and clear, not something to puzzle over.

This is a book to conjure with, but it's not a book for everyone. Some people may find it too abstract, with too much discussion of too few examples. If you're completely new to SQL, the book will be hard going. If you have very many years of experience with SQL, it's just possible that you won't find anything new in the book, although I expect you'll find a lot to think about. For anyone in between, The Art of SQL is a excellent way to improve the way you attack problems in database and query design.

You can purchase The Art of SQL from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

The Art of SQL

This discussion has been archived. No new comments can be posted.

Search 225 Comments Log In/Create an Account

Comments Filter:

BN vs Amazon (Score:3, Informative)

by beavis88 ( 25983 ) writes: on Wednesday June 07, 2006 @03:32PM (#15489546)

I know Amazon has software patents and all, but this (and just about every other book I see reviewed here) is ~20% cheaper at Amazon than it is at BN...

Comment removed (Score:2, Informative)

by account_deleted ( 4530225 ) writes: on Wednesday June 07, 2006 @03:50PM (#15489670)

Comment removed based on user account deletion

A useful review - Thanks! (Score:2, Informative)

by Anonymous Coward writes: on Wednesday June 07, 2006 @03:50PM (#15489675)

Many so-called book reviews on Slashdot fail to review the book. Instead, they simply state what each chapter covers. This review is actually useful. It describes the book's target audience, gives a sense of what the book does and doesn't contain, and helps me understand whether the book would be useful to me. Thanks!

This is an excellent book (Score:3, Informative)

by tcopeland ( 32225 ) * writes: <tom&thomasleecopeland,com> on Wednesday June 07, 2006 @03:59PM (#15489720) Homepage

I'm slowly working my way through it; it's a great book on a number of levels. The writing itself is very nice, with a real personality showing through and not just the usual dry technical flavor. The illustrations are done in a nifty "drawing" style that looks good and portrays the data well. The technical insights are very helpful; after reading what I've moved through so far I've rewritten some of my Rails code [blogs.com] to be more efficient.

I highly recommend this book; the $40 you'll spend on it will be repaid the first time you delete a swath of Java looping code and replace it with an additional subquery. If I can do half as well on my next book [generating...javacc.com] I'll consider it a job well done.

bookpool (Score:5, Informative)

by stoolpigeon ( 454276 ) * writes: <bittercode@gmail> on Wednesday June 07, 2006 @03:59PM (#15489725) Homepage Journal

bookpool has it cheaper [bookpool.com] than amazon.

Re:Where's the news? (Score:4, Informative)

by PCM2 ( 4486 ) writes: on Wednesday June 07, 2006 @04:04PM (#15489759) Homepage

Could Slashdot not post book reviews to the main section??

I like book reviews.
Homepage preferences [slashdot.org] are your friends.

Re:Bummer, trees (Score:3, Informative)

by eh2o ( 471262 ) writes: on Wednesday June 07, 2006 @04:16PM (#15489869)

FYI this book does mention many-to-many relations a few times, including its implications for multiple indexing and in the chapter on trees (since graphs are a generalization of trees where a node may have multiple parents). In the book there are 4 different data structures proposed for trees.

Re:So According to the blurb... (Score:2, Informative)

by Anonymous Coward writes: on Wednesday June 07, 2006 @04:21PM (#15489902)

umm... dude? SQL has been around since the mid 50's. A guy at IBM developed it. Now, it was made before high level languages, and brother, that's why SQL is anchronistic and irreperably flawed.

Yep, i said it.

the sample chapter is promising (Score:3, Informative)

by sqlgeek ( 168433 ) writes: on Wednesday June 07, 2006 @04:31PM (#15489979)

I like the look of this book quite a bit, judging soley from the sample chapter. It talks in a straight-forward manner about the factors that determine how a database goes about it's job and how you can make that job easier or harder. If the rest of the book plays out similarly then thorough understanding of this book as well as Tom Kyte's would make for a programmer I'd love to hire.

Re:Bummer, trees (Score:3, Informative)

by sqlgeek ( 168433 ) writes: on Wednesday June 07, 2006 @04:34PM (#15489989)

Didn't look at the sample chapter, did you? It deals with a many-many relation several times in varying contexts.

Re:Developers and SQL (Score:3, Informative)

by Fulcrum of Evil ( 560260 ) writes: on Wednesday June 07, 2006 @04:57PM (#15490159)

Why is it not a database problem?

Because it's a 5 page SQL query.

...how to get you your result fastest?

Sometimes, fastest still takes a long time.

Re:CTEs (Score:2, Informative)

by Osty ( 16825 ) writes: on Wednesday June 07, 2006 @05:37PM (#15490429)

Are we talking parent-child hierarchy tables? If so, Oracle's had statements to take care of that for a long time, since 1998 or so. Perhaps not ANSI standard, but they get the job done.

No, I'm talking about Common [sqlservercentral.com] Table [ibm.com] Expressions [ianywhere.com] (okay, so I was slightly wrong about implementation of CTEs -- apparently other products have implemented the standard, but DB2 and SQL Server 2005 are the only "Big Boy" engines with them). CTEs aren't so much about implementing a hierarchy as they are about doing recursive actions quickly and efficiently. Walking a parent-child hierarchy is just an example of a recursive problem that's easily solved with a CTE, but CTEs don't dictate how you should store your relationship information.

For what it's worth, it's possible to write recursive algorithms in just about any SQL implementation (convert your recursion to iteration, and it's not so bad), but the win with using CTEs is that it's still a set operation. Doing the loop yourself means you're losing SQL's set-based power. I did a little comparison on a naive parent-child implementation, doing two things: return the path to parent from a given node, and return the subtree of a given node. I implemented each algorithm in SQL 2000's T-SQL without CTEs and in SQL 2005 with CTEs. The CTE implementation was approximately 10 times faster than the by-hand iteration solution.

Re:CTEs (Score:3, Informative)

by BitterAndDrunk ( 799378 ) writes: on Wednesday June 07, 2006 @05:47PM (#15490502) Homepage Journal

TY for the info.
FWIW, the parent-child solve in Oracle is set based. CONNECT BY I think is the syntax, but it's been almost 7 years since I was working BOM hierarchies that utilized it heavily.

Re:Developers and SQL (Score:3, Informative)

by Baki ( 72515 ) writes: on Thursday June 08, 2006 @02:25AM (#15492771)

Slashdot is not handling really important data. any sane DBA (I'm not a DBA but mainly a developer) of a DBMS of any importance will normalize, must normalize. A DBMS lives for many generations of application programs, you cannot tell in the future how the data will be read. What counts is that the data is logically structured with minimum chance for inconsistencies, and duplicate data (which is what denormalized essentially is) always leads to inconsistencies sooner or later, or at least to maintenance nightmares.

You can speed things up by using materialized views or maybe derived tables (denormalized) for special (reporting) purposes.

Re:CTEs (Score:3, Informative)

by Johnno74 ( 252399 ) writes: on Thursday June 08, 2006 @05:34AM (#15493212)

The very cool thing about CTEs in SQL 2005 is you CAN reference the query name inside the query that defines the subquery. It references itself; it automatically recurses when it is executed.

Like this: (shamelessly ripped from http://www.yafla.com/papers/sqlhierarchies/sqlhier archies.htm [yafla.com])

WITH CTE_Example (EmployeeID, FullName, BossID, Depth)
AS
(
SELECT EmployeeID, FullName, BossID, 0 AS Depth
FROM Employees WHERE EmployeeID = @boss_id
UNION ALL
SELECT Employees.EmployeeID, Employees.FullName, Employees.BossID, CTE_Example.Depth + 1 AS Depth FROM Employees
JOIN CTE_Example ON Employees.BossID = CTE_Example.EmployeeID
)

SELECT * FROM CTE_Example

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

The Art of SQL 225

The Art of SQL More Login

BN vs Amazon (Score:3, Informative)

Comment removed (Score:2, Informative)

A useful review - Thanks! (Score:2, Informative)

This is an excellent book (Score:3, Informative)

bookpool (Score:5, Informative)

Re:Where's the news? (Score:4, Informative)

Re:Bummer, trees (Score:3, Informative)

Re:So According to the blurb... (Score:2, Informative)

the sample chapter is promising (Score:3, Informative)

Re:Bummer, trees (Score:3, Informative)

Re:Developers and SQL (Score:3, Informative)

Re:CTEs (Score:2, Informative)

Re:CTEs (Score:3, Informative)

Re:Developers and SQL (Score:3, Informative)

Re:CTEs (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot