Visual analysis of web pages in ruby

I am looking to write code that performs visual analysis of web pages, preferably using Ruby. My code will have to determine the top, left, width, height, background color, color and font size for all elements in the DOM. Of course, these values ​​can only be calculated after applying all the CSS. So, I do not think that Nokigiri is suitable for this work. Ultimately, I try to use this data in a VIPS-like (Vision-Based Page Segmentation) algorithm in an attempt to find the main content in the uploaded news articles.

I reviewed using Watir to control Chrome or Firefox, and then retrieve data. The problem is that browsers cannot be headless through Watir (I think). Ultimately, this code will run on an array of Linux servers in the data center. Thus, the code will not have easy access to the X server to display the browser.

I believe one solution is to use Watir and run a headless X server on Linux servers. This is a bit of a pain, but now it looks like the best option.

Does anyone have any better ideas?

+3
source share
2 answers

, , Selenium Xvfb. .

+4

Xvfb

. nohup Xvfb: 1- 0 1024x768x24 2 > & 1 > /dev/null &

firefox : 1

https://github.com/leonid-shevtsov/headless

+1

Source: https://habr.com/ru/post/1783917/


All Articles