
I am trying to extract all five rows listed in the table above.
I am using the hpricot Ruby library to retrieve table rows using an xpath expression.
In my example, the xpath expression that I use is / html / body / center / table / tr. Note that I removed the tbody tag from the expression, which is usually the case for successful retrieval.
The strange thing is that I get the first three lines as a result with the absence of the last two lines. I just have no idea what is going on there.
EDIT: Nothing magical about the code, just binding it on request.
require 'open-uri' require 'hpricot' faculty = Hpricot(open("http://www.utm.utoronto.ca/7800.0.html")) (faculty/"/html/body/center/table/tr").each do |text| puts text.to_s end
source share