Freeing memory in python script

I have a python script that removes some urls. I have a list of URLs, and for each URL I get html and do some logic with it.

I am using Python 2.7.6 and Linux Mint 17 Cinnamon 64-bit.

The problem is that my main scraping object, which I instance for each URL, is never released from memory, although there is no link to it. With this problem, my memory is constantly growing and growing fast (since my object is sometimes very large - up to 50 MB).

The simplified code looks something like this:

def scrape_url(url): """ Simple helper method for scraping url :param url: url for scraping :return: some result """ scraper = Scraper(url) # instance main Scrape object result = scraper.scrape() # scrape it return result ## SCRIPT STARTS HERE urls = get_urls() # fetch some list of urls for url in urls: print 'MEMORY USAGE BEFORE SCRAPE: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss result = scrape_url(url) # call helper method for scraping print 'MEMORY USAGE AFTER SCRAPE: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss print '-' * 50 

My output looks something like this:

 MEMORY USAGE BEFORE SCRAPE: 75732 (kb) MEMORY USAGE AFTER SCRAPE: 137392 (kb) -------------------------------------------------- MEMORY USAGE BEFORE SCRAPE: 137392 (kb) MEMORY USAGE AFTER SCRAPE: 206748 (kb) -------------------------------------------------- MEMORY USAGE BEFORE SCRAPE: 206748 (kb) MEMORY USAGE AFTER SCRAPE: 284348 (kb) -------------------------------------------------- 

The Scrape object is large, and it is not freed from memory. I tried:

 scraper = None del scraper 

or even call gc to collect the object with:

 gc.collect() 

but nothing helped.

When I print the number of references to the scraper object with:

 print sys.getrefcount(scraper) 

I get 2 , which I think means that there are no other references to the object and it should be cleaned with gc.

A scraper object has many subobjects. Is it possible that some of its references to auxiliary objects go somewhere and for this reason gc cannot free the main Scaper object, or is there another reason why python does not free memory?

I found some topic regarding this in SO and some answers in which they say that memory cannot be released unless you create / kill child processes that sound very strange ( LINK )

Thank you Ivan

+5
source share
1 answer

You are using an iterator that should be in memory at all times. Rewrite your loop to use the generator and lazily scratch. Sort of:

 def gen(): for i in xrange(0, len(urls)): yield urls[i] 
+1
source

Source: https://habr.com/ru/post/1242789/


All Articles