I am trying to extract every href link to an html page to evaluate w / nokogiri and xpath

Question

I am trying to extract every href link to an html page to evaluate w / nokogiri and xpath

I am trying to extract every href link to an html page to evaluate w / nokogiri and xpath. The fact that I still seem to be pulling out only the page headers. I'm not interested in the name of the link, but rather the URL that is pointed to.

Here is what I have:

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a').each do |node|
  puts node.text
end

Can someone guide me on how to fix this so that I draw the actual href instead of the text itself?

+3

ruby xpath nokogiri

paradoxic Aug 4 '10 at 10:14

source share

1 answer

Chris cameron-mills · Accepted Answer · 2010-08-04T10:17:14+0000

Your XPATH // a discards all elements. Which includes text content. You can use @attrname to access attributes. for instance

//a/@href

You will get href of each a in the document

I am trying to extract every href link to an html page to evaluate w / nokogiri and xpath

More articles: