Hidden HTML Scraper (when visible = false) using Hpricot (Ruby on Rails)

I ran into a problem that, unfortunately, I can not surpass, I am also just a newborn in Ruby on rails, unfortunately, therefore the number of questions

I am trying to clear a webpage such as the following:

http://www.yellowpages.com.mt/Malta/Grocers-Mini-Markets-Retail-In-Malta-Gozo.aspx 

I would like to clear the Addresses, phones and URLs of the next page, which in this case

 http://www.yellowpages.com.mt/Malta/Grocers-Mini-Markets-Retail-In-Malta-Gozo+Ismol.aspx 

I tried everything that I could think of, but nothing works, because they become invisible or so.

The address is in the h3 tag, but it does not seem to be disposed of. I also browse ScRUBYt from the following URL http://www.rubyrailways.com/ajax-scraping-with-scrubyt-linkedin-google-analytics-yahoo-suggestions/ , but I really cannot find the heads or tails of how to apply them in this case.

I would really appreciate any pointers, as this is an obstacle that I really need to overcome in order to move forward on my assignment. Thank you in advance for any help.

0
source share
3 answers

In the specific example that you specified, the elements are not hidden, but are loaded via ajax after the page loads. So basically you need an http client that can run javascript (web browser?) To see their address and other content.

If you want to really automate the process and refuse data received through ajax or javascript, you can try selenium . Although not designed for this purpose, it satisfies your needs.

+1
source

I have no answer to your specific question, but I thought that I would point to the RailsBames Ryan Bates episode on the ruby ​​scraper screen: http://railscasts.com/episodes/173-screen-scraping-with-scrapi

It uses a library called scrAPI instead of ScRUBYt, since it could not get ScRUBYt to work. Does scrAPI seem to be a bit lighter?

Hope this helps, good luck with your assignment! :)

-John

0
source

There is a good script in google group . It seems that you are extracting the address, etc. You can see the code for script page.txt .

-1
source

Source: https://habr.com/ru/post/893214/


All Articles