Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror

Comment Re:Okay kids...(in Ruby) (Score 2, Interesting) 104

I couldn't resist - in Ruby, using the beautiful (but much understated) hpricot library:

doc = Hpricot(open(html_document))
(doc/"a").each { |a| puts a.attributes['href'] }

Check it out - I've been using it for a project, and it's really fast and really easy to use (supports both xpath and css for parsing links). For spidering you should check out the ruby mechanize library (which is like perl's www-mechanize, but also uses hpricot, making parsing the returned document much easier).

Slashdot Top Deals

The light of a hundred stars does not equal the light of the moon.

Working...