Instead of multiprocessing.Pool , use multiprocessing.Queue and create an Inbox and Outbox.
Start one process for reading in files and put the data in the queue of incoming messages and set a limit on the size of the queue so as not to fill your RAM. The example here compresses individual files, but it can be configured to process entire folders at the same time.
def reader(inbox, input_path, num_procs): "process that reads in files to be compressed and puts to inbox" for fn in os.listdir(input_path): path = os.path.join(input_path, fn)
But only half the question, the other part is to compress the file without writing it to disk. We provide a StringIO object StringIO compression functions instead of an open file; it is passed to the tarfile . After compression, we put the StringIO object in the outgoing message queue.
Except that we cannot do this, because StringIO objects cannot be pickled, only pickletable objects can be queued. However, the getvalue function getvalue can provide the content in a selectable format, so grab the content using getvalue, close the StringIO object, and then put the content in the source folder.
from io import StringIO import tarfile def compressHandler(inbox, outbox): "process that pulls from inbox, compresses and puts to outbox" supplier = iter(inbox.get, None)
The recording process then extracts the contents from the outgoing message queue and writes them to disk. This function should know how many compression processes have been started, so he only knows to stop when he heard that each process is stopped.
def writer(outbox, output_path, num_procs): "single process that writes compressed files to disk" num_fin = 0 while True:
Finally, there is a kit to put everything together
import multiprocessing as mp import os def setup(): fld = 'file/path'