Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application.
Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.
The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.
The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF’s, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.
More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn’t extended very far and the behind-the-scenes algorithms powering search aren’t something I’ve had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I’m sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.
Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one’s own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.
On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren’t afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn’t have to wade through pages of flowery text before getting to the good bits. If you’re seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.
Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion."