I am trying to extract every href link to an html page to evaluate w / nokogiri and xpath

I am trying to extract every href link to an html page to evaluate w / nokogiri and xpath. The fact that I still seem to be pulling out only the page headers. I'm not interested in the name of the link, but rather the URL that is pointed to.

Here is what I have:

doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a').each do |node|
  puts node.text
end

Can someone guide me on how to fix this so that I draw the actual href instead of the text itself?

+3
source share
1 answer

Your XPATH // a discards all elements. Which includes text content. You can use @attrname to access attributes. for instance

//a/@href

You will get href of each a in the document

+3

Source: https://habr.com/ru/post/1757979/


All Articles