Posts Tagged ‘hpricot’

I remember Mayank saying me couple of months back that ruby is a
very powerful programming language and Ruby on Rails (RoR) is
gonna be “THE future” of web development. He went ahead with his
mission of ApnaBill after that, a system which is entirely designed in
RoR. I started the journey little late – but it’s worth starting.

For last couple of nights, I have been playing with Hpricot which is
one of the most powerful ruby gems. It’s generally used for HTML
parsing. The beauty of hpricot is that it allows you to parse by XPath
and CSS tags.

And if you have firebug plug-in installed then the world of web is
yours. You can parse any tag by it’s XPath doesn’t matter if it’s badly
structured and there are 100 tables in a single page with out any div id
or class name.

Here’s few quick steps that you can start looking at:

1> Open the URL in firefox.
[I am not against other web browsers, but I just love FF]

2> Enable firebug for the page.
[Install firebug plug-in prior to that if you haven’t done so].

3> Right Click on the value that you want to parse and do inspect element.

4> It will open up the HTML headers.

5> Click on the thingy that interests you. And copy the XPath.

6> Note: Firebug adds additional “tbody” tags in the XPaths.
[Just ignore them and go ahead]

7> Parse the same in hpricot.