Using Runaway memory using Selenium using PhantomJS

I wrote a script in Python that iterates over a long list of web pages and collects data using Selenium and PhantomJS as a webdriver (since I run it on a remote terminal computer running Linux and should use a headless browser). For short tasks, for example. where he has to iterate over several pages, no problem. However, for longer jobs, where it should iterate over a longer list of pages, I see that memory usage increases over time, every time a new page loads. In the end, after about 20 odd pages, the script dies due to memory overflow.

This is how I initialize my browser -

from selenium import webdriver url = 'http://someurl.com/' browser = webdriver.PhantomJS() browser.get(url) 

There are the following buttons on the page, and I iterate over the pages, find the xpath for the "Next>" button -

 next_xpath = "//*[contains(text(), 'Next >')]" next_link = browser.find_element_by_xpath(next_xpath) next_link.click() 

I tried to clear the cookies and cache for the PhantomJS browser in the following ways:

 browser.get('javascript:localStorage.clear();') browser.get('javascript:sessionStorage.clear();') browser.delete_all_cookies() 

However, none of them affected memory usage. When I use the Firefox driver, it works without any problems on my local computer, although it should be noted that there is much more memory on my local computer than on the remote server.

We apologize if any important information is missing. Please feel free to let me know how I can make my question more complete.

+6
source share

Source: https://habr.com/ru/post/1015130/


All Articles