I don't know about Sphinx but I agree that Lucene could be a good solution, for the reasons tolan-b lists. I work on a digital library cataloging project that indexes it's metadata with Lucene. We use PHP to generate the user-facing website, which queries our Lucene index via a Solr server. We do have a highly structured metadata schema and we do run queries that include things such as "give me all articles in Magazine including 'foo' in the title, published between 1950 and 1966" (which somebody in another comment suggested is not easy to do with Lucene, but in our experience was very easy). And adding a Solr server on top makes it easy to include features like faceted search.