Nokogiri - works with XML, not much with HTML

I had a problem with Nokogiri working correctly. I am using version 1.4.4 with Ruby 1.9.2.

I have libxml2 libxslt installed and updated. When I run the Ruby script with XML, it works fine.

require 'nokogiri' doc = Nokogiri::XML(File.open("test.xml")) doc = doc.css("name").each do |node| puts node.text end 

Enter CL, run ruby test.rb , return

 Name 1 Name 2 Name 3 

And the crowd goes crazy. I am setting up a few things, doing a few code settings ...

 require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open("http://domain.tld")) doc = doc.css("p").each do |node| puts node.text end 

Back to CL, ruby test.rb , returns ... nothing! Just a new, empty line.

Is there a reason why it will work with an XML file, but not with HTML?

+4
source share
1 answer

To debug this problem, we need more information from you. Since you are not giving a working URL, and because we know that Nokogiri works great for this kind of problem, debugging falls on you.

Here is what I would do to check:

In IRB:

  • Do you get the result: open('http://whateverURLyouarehiding.com').read
  • If this returns a valid document, what do you get when you complete the previous open statement in Nokogiri::HTML(...) . This needs to save the .read in the previous line, so that Nokogiri gets the body of the page, not the input / output stream.
  • Try # 2 above, but remove .read . This will tell if there is a problem with Nokogiri reading the IO stream, although I seriously doubt that it has a problem since I use it all the time. At this point, I would suspect a problem in your system.
  • If you receive a document in # 2 and # 3, the problem may be in your accessory; I suspect you are not looking.
  • If it exists, check the value of doc.errors after Nokogiri parses the document. It can be a search for errors in the document, and if so, they will be captured there.
+5
source

Source: https://habr.com/ru/post/1347935/


All Articles