CSS selection with Nokogiri

I am trying to clear HTML using Nokogiri but not getting the expected result.

At this specific URL, I was looking at transactions in a specific place and wanted to display the details of the transaction on this page. .small-deals-cont is the CSS selector for the page, and similarly .deal-title is the CSS selector for the deal name.

 require 'rubygems' require 'nokogiri' require 'open-uri' url = "http://www.snapdeal.com/local-deals-Chennai-all?category=all&HID=dealHeader_all" doc =Nokogiri::HTML(open(url)) puts doc.at_css("title").text doc.css(".small-deals-cont").each do |item| puts item.at_css(".deal-title") end 
+4
source share
2 answers

Nokogiri really works for this, and we do not need to use mechanization for this. Here is the code for it:

 require 'rubygems' require 'nokogiri' require 'open-uri' require 'csv' hotel= Array.new cuisine=Array.new url= "http://www.abcd.com" 1.upto(5) do |page_num| doc = Nokogiri::HTML(open("http://www.abcd.com/cit/restaurants?page=#{page_num}")) puts doc.at_css("title").text doc.css("article").each do |item| hotel << item.at_css("a").text cuisine << item.at_css(".tags").text end end @hotel=hotel @cuisine=cuisine ( 0..@hotel.length - 1).each do|index| puts "Hotel: #{@hotel[index]}" puts "Cuisine: #{@cuisine[index]}" puts " " end CSV.open("output2.csv", "wb") do |row| row << ["Hotel", "Cuisine"] ( 0..@hotel.length - 1).each do |index| row << [@hotel[index], @cuisine[index]] end end 
+4
source

To prevent curettage, they are likely to load content after the page loads initially (using javascript). Nokogiri will not help in this case, you will need a slightly more advanced system - perhaps using mechanize .

In the end, however, you should not scrap. The owners of this site are using methods to prevent this, and you must respect this. Check out the API.

+2
source

Source: https://habr.com/ru/post/1432077/


All Articles