Update RubyfulSoup is not the fastest code in the the world, but it seems to do the job.
So, I finally scrambled up on the Ruby bandwagon and I'm intending to use it to demonstrate the benefits of scripting languages in an upcoming lecture. I'm developing the examples and gradually coming to terms with the corners of the syntax, when I find a showstopper.
I want to scrape a web page that includes tags that are not closed properly. All the Ruby HTML/XML parsers I can find fall over at this point. I've implemented the exercise in (gasp) Java, which is a worse tool (right?) but has fault-tolerant parsers available.
This has to be a really common scripting task, so I must be missing something. Can someone help me out?
P.S. and a Happy New Year.
Posted by stevef at January 6, 2006 10:14 PM