My problem is pretty simple: I have a 400 megabyte file filled with 10,000,000 rows of data. I need to iterate over each line, do something and delete the line from memory so as not to fill up too much RAM.
Since my machine has multiple processors, my initial idea to optimize this process was to create two different processes. It would be possible to read the file several lines at a time and gradually fill the list (one list item - one line in the file). The other will have access to the same list and push the elements () out of it and process them. This will create a list that will grow on one side and shrink from the other.
In other words, this mechanism must implement a buffer that will be constantly filled with lines to complete the second process. But perhaps this is not faster than using:
for line in open('/data/workfile', 'r'):
source
share