I would like to analyze a table using Nokogiri. I do it like this.
def parse_table_nokogiri(html)
doc = Nokogiri::HTML(html)
doc.search('table > tr').each do |row|
row.search('td/font/text()').each do |col|
p col.to_s
end
end
end
The part of the table in which I have these rows:
<tr>
<td>
Some text
</td>
</tr>
... and some have it.
<tr>
<td>
<font> Some text </font>
</td>
</tr>
My XPath expression works for the second scenario, but not the first. Is there an XPath expression that I could use that would give me text from the innermost node of the cell so that I can handle both scripts?
I included the changes in my fragment
def parse_table_nokogiri(html)
doc = Nokogiri::HTML(html)
table = doc.xpath('//table').max_by {|table| table.xpath('.//tr').length}
rows = table.search('tr')[1..-1]
rows.each do |row|
cells = row.search('td//text()').collect {|text| CGI.unescapeHTML(text.to_s.strip)}
cells.each do |col|
puts col
puts "_____________"
end
end
end
source
share