I have a python script that removes some urls. I have a list of URLs, and for each URL I get html and do some logic with it.
I am using Python 2.7.6 and Linux Mint 17 Cinnamon 64-bit.
The problem is that my main scraping object, which I instance for each URL, is never released from memory, although there is no link to it. With this problem, my memory is constantly growing and growing fast (since my object is sometimes very large - up to 50 MB).
The simplified code looks something like this:
def scrape_url(url): """ Simple helper method for scraping url :param url: url for scraping :return: some result """ scraper = Scraper(url)
My output looks something like this:
MEMORY USAGE BEFORE SCRAPE: 75732 (kb) MEMORY USAGE AFTER SCRAPE: 137392 (kb) -------------------------------------------------- MEMORY USAGE BEFORE SCRAPE: 137392 (kb) MEMORY USAGE AFTER SCRAPE: 206748 (kb) -------------------------------------------------- MEMORY USAGE BEFORE SCRAPE: 206748 (kb) MEMORY USAGE AFTER SCRAPE: 284348 (kb) --------------------------------------------------
The Scrape object is large, and it is not freed from memory. I tried:
scraper = None del scraper
or even call gc to collect the object with:
gc.collect()
but nothing helped.
When I print the number of references to the scraper object with:
print sys.getrefcount(scraper)
I get 2 , which I think means that there are no other references to the object and it should be cleaned with gc.
A scraper object has many subobjects. Is it possible that some of its references to auxiliary objects go somewhere and for this reason gc cannot free the main Scaper object, or is there another reason why python does not free memory?
I found some topic regarding this in SO and some answers in which they say that memory cannot be released unless you create / kill child processes that sound very strange ( LINK )
Thank you Ivan