Please create an account to participate in the Slashdot moderation system


Forgot your password?
DEAL: For $25 - Add A Second Phone Number To Your Smartphone for life! Use promo code SLASHDOT25. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. Check out the new SourceForge HTML5 internet speed test! ×

Comment Re:I didn't know (Score 5, Informative) 23

The idea is to give everyone access to crawl data. If you work at a large search company, you have access to crawl data. You can also set up crawlers to get the data yourself, but that is expensive and having countless crawlers doing duplicative work is not ideal. Our idea is that there should be one common repository for crawl data that anyone can use. Researchers are using it for NLP, IR, sentiment analysis and many other things like measuring the adoption of metadata formats Educators are using it as a real world dataset to teach big data techniques in the classroom. Developers and entrepreneurs are using it for startups. Sorry I don't have a car analogy :) Feel free to email me if you have any other questions lisa at commoncrawl dot org

Comment Re:How does Common Crawl compare w/ Internet Archi (Score 3, Informative) 23

Hi I work at Common Crawl. Internet Archive is awesome and does really important work. The main difference between us and Internet Archive is that you can analyze our data. Internet Archive is a vault and is not available on a platform where you can run jobs against it. Because we put it on Amazon and other compute platforms, anyone can access our data and run jobs against it. If you wanted to do that with Internet Archive's crawl you would have to ask permission, get permission, and download it to your personal data center in order to analyze it. I don't know too many people with a personal data center :) Lisa

Slashdot Top Deals

You mean you didn't *know* she was off making lots of little phone companies?