A Taste of Hpricot

Posted: July 4, 2008 in Ruby on Rails, Tools
Tags: , , , , ,

I remember Mayank saying me couple of months back that ruby is a
very powerful programming language and Ruby on Rails (RoR) is
gonna be “THE future” of web development. He went ahead with his
mission of ApnaBill after that, a system which is entirely designed in
RoR. I started the journey little late – but it’s worth starting.

For last couple of nights, I have been playing with Hpricot which is
one of the most powerful ruby gems. It’s generally used for HTML
parsing. The beauty of hpricot is that it allows you to parse by XPath
and CSS tags.

And if you have firebug plug-in installed then the world of web is
yours. You can parse any tag by it’s XPath doesn’t matter if it’s badly
structured and there are 100 tables in a single page with out any div id
or class name.

Here’s few quick steps that you can start looking at:

1> Open the URL in firefox.
[I am not against other web browsers, but I just love FF]

2> Enable firebug for the page.
[Install firebug plug-in prior to that if you haven’t done so].

3> Right Click on the value that you want to parse and do inspect element.

4> It will open up the HTML headers.

5> Click on the thingy that interests you. And copy the XPath.

6> Note: Firebug adds additional “tbody” tags in the XPaths.
[Just ignore them and go ahead]

7> Parse the same in hpricot.
[Example: doc.search(‘/html/body//table/tr/td[3]’)]

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s