Follow Slashdot stories on Twitter


Forgot your password?

Submission + - Book Review: Amazon SimpleDB Developer Guide

KuanH writes: "Amazon SimpleDB Developer Guide" is billed as a complete guide to using Amazon's SimpleDB database API. It's most detailed for PHP. It's helpful for Python. But the Java code and explanations aren't up to the standard of the others. It includes a primer on using Amazon S3 with SimpleDB: files stored on S3, file metadata stored in SimpleDB — again, less good for Java. It also covers tuning to reduce usage costs, caching using memcached, and ways to batch-update and make serial or parallel requests to SimpleDB. However, it's missing some information that beginners might need, and it's perhaps not quite advanced enough for the more experienced. Downloadable example code is available only for PHP.

Say "cloud" to get the attention of CIOs seeking to cut costs in these recessionary times. One well known "database in the cloud" option is Amazon Web Services' SimpleDB, which Amazon describes as "a highly available, flexible, and scalable non-relational data store that offloads the work of database administration."

With Amazon's free usage tier offer, currently new AWS customers get limited free use of certain services for a year, while other services like SimpleDB include a free monthly quota without, at present, any cut-off date (although SimpleDB free data transfer in/out allowances expire after a year). So, if the "cloud thingy" you want is databases, it's a good time to sign up and have a play. Note though that you still have to give Amazon your credit card info in advance. Exceed your free quota and you'll pay for what you use, as with AWS's other services.

Those who prefer traditional relational databases could try eg Amazon RDS. This book only covers SimpleDB, a NoSQL or non-relational database. As is well known, NoSQL databases grew in popularity with the growth of large distributed systems and cloud computing, and their proponents tout their scalability and speed.

For anyone wanting a quick primer on NoSQL databases generally, this book includes a chapter on NoSQL which isn't limited to SimpleDB. It outlines some key conceptual differences between NoSQL and relational database management systems, with pros and cons, using the analogy of "a spreadsheet with some XML characteristics", and illustrating with some concrete examples. That chapter's been made available as a free sample chapter (SimpleDB versus RDBMS), so you can get a flavour of the book.

The contents list for this book is online, I won't recite it here. As well as an overview of SimpleDB, its terminology and advantages, the book goes through signing up with AWS and SimpleDB, and the account access keys. That chapter is also online, as a tutorial.

You may ask, how does this book differ from Amazon's free SimpleDB documentation, which includes a developer guide (PDF) and a "getting started" guide?

Amazon's own "getting started" is certainly helpful, and it's worth downloading and trying their web app scratchpad. But Amazon's detailed developer guide concentrates on REST and SOAP requests, which most people wouldn't want to deal with direct at that low level.

This book's focus is on using the SimpleDB web services API through certain specific languages and libraries — namely Java (JDK6 — using the typica 1.6 library plus several dependencies), Python (2.5 — you need boto), and PHP (with curl). It recommends the SDBtool Firefox extension (SDBizo), which is excellent for checking the results of running the code.

I've tried the book's Java and Python examples, on Windows. Not PHP, as I've not got round to learning PHP yet, though I skimmed the PHP explanations. Similarly, I've not had time to try it all over again on Linux. Generally, the book's coverage seems fuller and better for PHP than for Java or Python. Perhaps it was originally written for PHP, and the rest was bolted on — the stuff for Java more hurriedly than for Python?

The downloadable code samples, as mentioned, are PHP only. They really should have provided downloadable code for all 3 languages, plus some fake MP3 files (see later). If you get the e-book (available in PDF and epub), you can copy and paste the Java or Python code. But that's a tad tedious, especially when the code runs onto a new page, and there are stray end of lines etc that you have to delete manually. Furthermore, the Python code provided is for the interpreter in interactive mode (not for .py files, except a couple towards the end). So, for the Python, you also have to copy/paste each line one at a time. But that still beats having to re-type pages of code in full.

In other words, if you want this book and you're only interested in PHP, you can get away with just buying the hard copy and downloading the code from the Packt site. But if you prefer Python or Java, to save your fingers and blood pressure you should buy just the e-version, or get both paper and e books together. I really hope Packt will in future provide downloadable code samples for all the languages covered.

I have more issues with the sample code given in this book. The typica imports should have been spelled out in the example Java code. Eclipse offers more than one possible import in some cases. It was "try everything till it works", at least until I found this tutorial. I've included the initial required typica imports (though not the standard java.util etc ones) in my own list of points, which I'll say more about at the end of this review. Surely it wouldn't have been difficult to include just those few lines of imports, which could have saved readers a lot of time trying to work out the correct imports. There are also errors in the Python code, and on one page the code that should have been included is missing altogether.

Now, more on the book proper. After the overview described above, this book walks you through the basic SimpleDB operations: how to create a SimpleDB "domain" (equivalent to a worksheet in a spreadsheet), list domains, create/retrieve items (like spreadsheet rows), and delete domains.

Items have attributes (spreadsheet column headings), as key:value pairs — the key is the attribute name, the value is its value, eg address:1 Acacia Avenue. An attribute can have more than one value, eg the same item can have both address:1 Acacia Avenue and address:2 Broadway. The book also lists the SimpleDB constraints on domains, items and attributes — maximum number or size, etc — but it's best to check the AWS site for the latest info.

Code examples are given for each of the 3 languages mentioned. The examples are similar, but don't always cover the same ground. If they'd done that, where possible, it would have been more helpful to those of us trying examples in more than one language. One advantage of a book with associated website is that electronic updates can be published, and it would have been great if that had been done for this book. For instance, the book gave conditional put/delete code examples only for PHP. At the date of this review, boto now supports those features, but sample supplemental Python code for that still hadn't been made available.

SimpleDB stores attribute values as UTF-8 strings. This means that comparisons for sorting or searching are done lexicographically (character by character, left to right, numbers take precedence over uppercase over lowercase), and to handle numbers or dates you have to encode and decode them yourself. So, the book has a chapter explaining lexicographical comparison, data types, and how to encode and decode data to enable proper sorting and comparison of numbers, dates, Boolean values and XML-restricted characters. In the case of numbers this means zero padding and offsets, and there's example code for decoding and encoding numbers. Unlike with PHP and Python, oddly the Java code given was for the body of the typica method that carries out the encoding etc. This could have been omitted, and they should have given example code illustrating the method's usage instead. Similarly for the date formats code.

The SimpleDB query syntax is generally covered well, in a chapter which takes readers through first creating a sample database of song metadata to run queries against. It's not too painful copy/pasting the Java code (3+ pages), but with Python in interactive mode I drew the line at creating every song item and attributes using individual statements, even with pasting, so I just tried adding a couple of random ones to test that the code worked. I say again, full downloadable code please...!

That chapter then gives helpful examples of queries against the sample database and their results, including for more complex combined queries ("and", "or" type queries, "not" etc), and querying for multiple-value attributes. It also provides code examples for sorting and counting query results. But the Java code for retrieving an item's attributes wouldn't run, and I couldn't find the method used (getItemsAttributes()) detailed in the typica documentation; perhaps the book is out of date here?

The book starts going beyond the basics from Chapter 7 onwards, with a chapter on Amazon's S3 storage service — another well known component of Amazon Web Services, where "objects" (files) may be stored in "buckets" (directories), with "keys" used to retrieve objects.

For S3, the book uses JetS3t for Java. However, the Java code given for uploading files to S3 didn't demonstrate any integration with SimpleDB at all — the files were just uploaded with their filenames as the S3 keys, and the code didn't seem to deal with the creation of your own custom S3 keys for uploaded objects. In contrast, the Python code generated the S3 keys for the files from hashes previously produced and stored in the SimpleDB database, as well as dealing with their uploading. In addition, for me the Java code for downloading files from S3 just wouldn't run, and also it wasn't clear where the files were supposed to be downloaded to locally, unlike with the Python example.

Inexplicably, there was no info on how to delete objects from S3 buckets, or indeed how to delete buckets.

So, while the S3 chapter is of help, it could definitely do with being expanded, especially the Java sections.

Now, lawyer hat time. The book used MP3 files of well-known songs to illustrate uploading to S3 and downloading. If the book is intended for a global audience, perhaps they should have used some other file type instead, eg text files containing notes. For the purposes of learning S3, readers won't generally care about the types of files — they just need to know how to code the uploads and downloads. But in some countries, depending on the circumstances including licences etc, it may be unlawful to upload those MP3 files, even if you've paid for them. So using MP3 files for the examples may not have been the best idea: Cloud Drive is not without controversy.

Here in the UK, it's even technically unlawful to rip CDs that you've bought. Although the UK Gowers review, way back in 2006, recommended at least legalising the format shifting of "legitimately purchased content", no UK government has shown any signs of doing that. We don't know yet what the recently-commissioned Hargreaves review of intellectual property law will come up with (maybe introduce the US "fair use" concept to the UK?), and we shall see whether the present government will act on that review's recommendations. But I digress.

Anyway, from the publisher's viewpoint it might have been better if they'd used some other file type for illustration, or, better still, provided readers with downloadable "fake MP3 files" with the right extension and titles, but with content to which they had the rights. For my own testing of the book's code, I just used minimal text files with an .MP3 extension and filenames matching the song names in the book. And then I tried it with a single real MP3 file, which I had the right to upload, just to check that it did download and play correctly.

I hope the authors and publishers will provide "fake" MP3 files as part of the book's downloads, in future. That would, quite apart from avoiding any questionmarks about legalities, save readers a lot of time, as you wouldn't then have to generate your own "MP3" files just to be able to try out the book's code. Yes, you could record yourself going blah blah blah, but this book's not meant to be about making you create your own MP3s.

Next, money money money. AWS charges are based on usage, so the chapter on tuning and usage costs has some practical value in explaining how SimpleDB is charged for, the "BoxUsage" value returned by requests to SimpleDB, using BoxUsage to optimise queries and compute costs, and how to get BoxUsage values back with your queries using Java, Python etc. There are code examples that, when run, illustrate the different BoxUsage values you get when you use different operators or expressions in queries (eg, using LIKE costs more).

However, partitioning your data into multiple domains is covered in only a few paragraphs, with no code given. I'd have liked to see more info on that, and some sample code for the partitioning process.

To further save money, you can use a cache to store data locally, trying your local cache first; and, only if the data is not there, would your app go out to SimpleDB and incur costs for querying it. This book accordingly has a chapter on how to install and use the popular open source caching system memcached to cache your query results locally. (CacheLite for PHP is also covered.) Again, the Java sections caused me some frustration. The Java test code showed that the memcached server was running properly on my machine, but the Java code for using the cache just didn't work; it ran, but continued to query SimpleDB direct. The Python code, however, worked perfectly — except that, if you're using memcached in Windows, you'll need to use port 11211 instead of what's shown in the book. (I didn't try it in Linux.)

Finally, the book deals with running parallel operations against SimpleDB, using its BatchPutAttributes. The section on updating SimpleDB in Python by making serial consecutive calls to SimpleDB is completely missing the code for the script, but the book does then cover inserting multiple items concurrently into SimpleDB using a threadpool in Java. It also gives sample Python code for alternative ways of parallelising requests: using Python's built-in threading module, threading and queues combined, then threading using the open source workerpool module.

To conclude, in substance the book has a fair amount of useful information on the basics of getting started with SimpleDB, particularly for Python (and probably PHP). But not providing downloadable code samples in Java and Python, or "fake" MP3 files to try S3 uploading/downloading, is a minus.

Some errors, inconsistencies and missing information from the department of “I-wish-they’d-included-this-even-if-they-thought-it-was-basic-as-it's-too-easily-missed-if-it's-not-spelled-out”, mean that the book is not really "complete", and not as suitable as it should be for relative beginners — especially for Java and (in whatever language) Windows. It wouldn't take much extra work to get it up to scratch on that front. Perhaps the next edition, or better still an online update/supplement?

I decided to note down some issues I'd encountered in going through the book and provide some of the missing information and corrections in my own unofficial errata list, which I'm submitting to Packt. Please allow me my additional queries, quibbles and wishlist items there!

I gave up trying to figure out the example Java code towards the end of the book which wouldn't work properly, but otherwise my list should plug many of the book's minor gaps, such as what else you need to add to your Java classpath in the S3 chapter, and some gaps for Windows users. I've also noted some errors or inconsistencies in the Python code which stop it from running if copy/pasted.

For beginners, therefore, the book plus my list should hopefully be enough to get you started. But, as I said, I've not had time to learn PHP and try to run the PHP code, sorry.

For the more experienced, the book doesn't take readers to as advanced a stage as it could have, in my view. In particular, it would have been good to have more info and example code on partitioning data between different domains, and also how to migrate data from an existing database to SimpleDB — their code for "importing" the sample database literally just adds each item and attribute individually. (But, here's Amazon's guide on migrating PHP/MySQL, plus an article on the DBLoader migration tool.)

Fix the errors, add the missing info for beginners, provide downloads of code in all relevant languages and "fake files", and I'd have given it a 7. Provide working sample Java code with more explanation, plus proper integration with S3, an 8. Add fuller info on partitioning, migration, and perhaps even integration with yet more AWS services, a 9.

Full disclosure: I received a free e-copy of this book from Packt for review. They've not paid me anything to write this, they just wanted me to give my honest opinion. Which I've done. I may or may not get another free e-book after this; possibly not!

All opinions are personal to me: half geek, half lawyer, mostly harmless. I'm researching legal issues in cloud computing: see ComputerWorld UK, OpenTech 2011
This discussion was created for logged-in users only, but now has been archived. No new comments can be posted.

Book Review: Amazon SimpleDB Developer Guide

Comments Filter:

New systems generate new problems.