Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

×

Comment: Re:Interesting, however (Score 2) 61

by HipsterMcGee (#38055508) Attached to: Common Crawl Foundation Providing Data For Search Researchers
You're absolutely correct - although if they do have it indexed, it's certainly much easier on the researchers. Actually - I worked on this project: http://lemurproject.org/clueweb09.php/ ... and I can tell you first hand, not only is it not easy to crawl that much data, but then to index it, it takes not only time but computing muscle, and lots, lots, lots of disks. It took us roughly 1 and 1/2 months to collect the data using a Hadoop cluster with 100 nodes running on it, and then roughly 2 months of compute power (using 24 nodes, so, roughly a couple of days via the wall clock time) to index the data. And then factor in the resources you need to have to experiment with that amount of data. The hardware and IT maintenance costs alone for this setup is probably going to be more than the costs to run your experiments via EC2.

The use of money is all the advantage there is to having money. -- B. Franklin

Working...