Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
Books

Submission + - Book Review: Solr 1.4 Enterprise Search Server 1

MassDosage writes: "[Note to Slashdot editors: my e-mail address is massdosage@gmail.com if you need to contact me. Please do NOT publish this on the site.]

Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application.

Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.

The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.

The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF’s, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.

More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn’t extended very far and the behind-the-scenes algorithms powering search aren’t something I’ve had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I’m sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.

Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one’s own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.

The final three chapters move on to the more practical side of actually using Solr in the “real world” and discuss various deployment options, how it can be monitored using JMX, security, integration and scaling. In addition to Java (which is the probably the most powerful and straightforward way of integrating with Solr) support for languages like JavaScript, PHP and Ruby is described. I felt the Ruby section was way too long, maybe one of the authors has a soft spot for the Ruby language? The sections on writing a web crawler and doing autocomplete were far more interesting and probably also more generally applicable. The book wraps up with a thorough discussion on how to scale Solr from scaling high (optimising a single server through techniques like caching, shingling and clever schema design and indexing strategies), scaling wide (using multiple Solr servers and replicating or sharding data between them) and scaling deep (a combination of the former two approaches).

On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren’t afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn’t have to wade through pages of flowery text before getting to the good bits. If you’re seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.

Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion."
Image

PhD Candidate Talks About the Physics of Space Battles Screenshot-sm 361

darthvader100 writes "Gizmodo has run an article with some predictions on what future space battles will be like. The author brings up several theories on propulsion (and orbits), weapons (explosives, kinetic and laser), and design. Sounds like the ideal shape for spaceships will be spherical, like the one in the Hitchhiker's Guide movie."

Submission + - Man Controls Cybernetic Hand with Thoughts (unicampus.it)

MaryBethP writes: Scientists in Italy announced Wednesday that Pierpaolo Petruzziello, a 26-year-old Italian who had lost his left forearm in a car accident, was successfully linked to an artificial limb that was neural planted in the median and ulnar nerves. He has learned to control the artificial limb with his mind. According to cnet, Petruzziello says he could feel sensations in it, as if the lost arm had grown back again.

http://news.cnet.com/8301-17938_105-10408139-1.html

Patents

Submission + - Federal Appeals Court Tosses Spam Patent (patentlyo.com)

Zordak writes: Dennis Crouch's Patently-O is running a news item about U.S. patent 6,631,400, which has claims drawn to a method of making sure enough people get your spam. A federal district court had overturned the patent as anticipated, obvious, and not drawn to patentable subject matter. The Federal Circuit, the appeals court which hears patent matters, upheld the finding of obviousness, thus invalidating the patent.
Earth

Swarm of Giant Jellyfish Capsize 10-Ton Trawler 227

Hugh Pickens writes "The Telegraph reports that the Japanese trawler Diasan Shinsho-maru has capsized off the coast of China, as its three-man crew dragged their net through a swarm of giant jellyfish (which can grow up to six feet in diameter and travel in packs) and tried to haul up a net that was too heavy. The crew was thrown into the sea when the vessel capsized, but the three men were rescued by another trawler. Relatively little is known about Nomura's jellyfish, such as why some years see thousands of the creatures floating across the Sea of Japan on the Tsushima Current, but last year there were virtually no sightings. In 2007, there were 15,500 reports of damage to fishing equipment caused by the creatures. Experts believe that one contributing factor to the jellyfish becoming more frequent visitors to Japanese waters may be a decline in the number of predators, which include sea turtles and certain species of fish. 'Jellies have likely swum and swarmed in our seas for over 600 million years,' says scientist Monty Graham of the Dauphin Island Sea Lab in Alabama. 'When conditions are right, jelly swarms can form quickly. They appear to do this for sexual reproduction.'"
Businesses

Should You Break TOS Because Work Asks You? 680

An anonymous reader writes "My boss recently assigned me a project that was all his idea, with two basic flaws that would require me to break multiple web sites' Terms of Service (TOS). Part requires scraping most of the site, parsing the data and presenting it as our own without human intervention. While we're safe on copyright issues, clearly scraping like this is normally not allowed. At times it might also put a load on those sites. The other is, for lack of better words, a 'load balancing' part that requires using multiple free accounts instead of purchasing space and CPU time for less than $2,000 USD per month. The boss sees it as 'distributed' computing when in reality it's 'parasitic.' My question is: am I wrong about the ethics? If I do need to walk, how best can I handle it without damaging my reputation and future employment opportunities?"
The Internet

Free Online Scientific Repository Hits Milestone 111

ocean_soul writes "Last week the free and open access repository for scientific (mainly physics but also math, computer sciences...) papers arXiv got past 500,000 different papers, not counting older versions of the same article. Especially for physicists, it is the number-one resource for the latest scientific results. Most researchers publish their papers on arXiv before they are published in a 'normal' journal. A famous example is Grisha Perelman, who published his award-winning paper exclusively on arXiv."
Privacy

New Bill To Rein In DHS Laptop Seizures 311

twigles writes with news of a new proposed bill that seeks to curtail DHS's power to search and seize laptops at the border without suspicion of wrongdoing. Here is Sen. Feingold's press release on the bill. The new bill has more privacy-protecting safeguards than the previous one, which we discussed last month. "The Travelers Privacy Protection Act, a bill written by US Senators Russ Feingold, D-Wis., and Maria Cantwell, D-Wash., and Rep. Adam Smith, D-Wash., would allow border agents to search electronic devices only if they had reasonable suspicions of wrongdoing. In addition, the legislation would limit the length of time that a device could be out of its owner's possession to 24 hours, after which the search becomes a seizure, requiring probable cause."
Enlightenment

Submission + - Tech Puts America on the Map (eweek.com)

eweekhickins writes: "Right out of National Treasure, the Library of Congress used hyperspectral imaging to look for hidden text behind the oldest known map to include America. Hyperspectral imaging combines both conventional imaging and spectroscopy, using optical elements, lenses, spatial filters and image sensors to capture 3D image cubes of the object. After years of highly restricted use by the government mapping agencies, hyperspectral imaging is emerging as a valuable tool for historical conservationists and preservationists."
Earth

Submission + - 3000 Swimming Robots Report No Global Warming 2

NobleSavage writes: As reported by NPR, 3000 Swimming Robots have been been busy plying the ocean collecting temperatures data and the results have scientists puzzled:

These diving instruments suggest that the oceans have not warmed up at all over the past four or five years. That could mean global warming has taken a breather. Or it could mean scientists aren't quite understanding what their robots are telling them. This is puzzling in part because here on the surface of the Earth, the years since 2003 have been some of the hottest on record. But Josh Willis at NASA's Jet Propulsion Laboratory says the oceans are what really matter when it comes to global warming.
Security

Submission + - Hacking a pacemaker

jonkman sean writes: University researchers conducted research into how they can gain wireless access to pacemakers, hacking them. This story is covered in the New York Times as well as CNET. They will be presenting their findings at the "Attacks" session of the 2008 IEEE Symposium on Security and Privacy. Their previous work noted that over 250,000 implantable cardiac defibrillators are installed in patients each year. This subject [Wired] was first raised along with similar issues as a credible security risk in Gadi Evron's CCC Camp 2007 lecture "hacking the bionic man".
Microsoft

Submission + - Microsoft's definition for open source developers (microsoft.com)

An anonymous reader writes: According to Microsoft's Patent pledge for open source developers

To benefit from this promise, You must be a natural or legal person participating in the creation of software code for an open source project. An "open source project" is a software development project the resulting source code of which is freely distributed, modified, or copied pursuant to an open source license and is not commercially distributed by its participants. If You engage in the commercial distribution or importation of software derived from an open source project or if You make or use such software outside the scope of creating such software code, You do not benefit from this promise for such distribution or for these other activities.


"Non-commercial open source developers" would still have to worry. Rather than a license, the pledge is specified as a "promise" which may not really have legal significance .

Redistribution of the software also becomes discouraged:

This is a personal promise directly from Microsoft to You, and You acknowledge it is a condition of benefiting from it that no Microsoft rights are received from suppliers, distributors, or otherwise by any other person in connection with this promise.
Which means, you cannot pass the "benefits" of this promise to other people, effectively if someone else wants to use your source code to make his own version of the project, the patent pledge better still be 'active' else there is now way he will receive the protection.

This is a personal promise directly from Microsoft to You, and You acknowledge it is a condition of benefiting from it that no Microsoft rights are received from suppliers, distributors, or otherwise by any other person in connection with this promise

Microsoft

Submission + - Microsoft must pay $1.4bn to EU (bbc.co.uk)

saphena writes: The European Commission has fined US computer giant Microsoft for defying sanctions imposed on it for anti-competitive behaviour.

Microsoft must now pay 899 million euros ($1.4bn; £680.9m) after it failed to comply with a 2004 ruling that it took part in monopolistic practices.

The ruling said that Microsoft was guilty of not providing vital information to rival software makers.

EU regulators said the firm was the first to break an EU antitrust ruling.

Wireless Networking

"GiFi" — Short-Range, 5-Gbps Wireless For $10/Chip 190

mickq writes "The Age reports that Melbourne scientists have built and demonstrated tiny CMOS chips, 5 mm per side, that can transmit 5 Gbps over short distances — about 10 m. The chip features a tiny 1-mm antenna, a power amp that is only a few microns wide, and power consumption of only 2 W. 'GiFi' appears set to revolutionize short-distance data transmission, and transmits in the relatively uncrowded 60GHz range. Best of all, the chip is only about a year away from public release, and will only cost around US $9.20 to produce."

Slashdot Top Deals

fortune: cpu time/usefulness ratio too high -- core dumped.

Working...