I have a python method that needs to collect a lot of data from an API, format it in CSV, compress and return the result.
I was Googling, and every solution I can find either requires writing to a temp file, or storing the entire archive in memory.
Memory is definitely not an option, since I quickly get OOM. Writing to a temporary file has many problems associated with it (in this field only the log disk is used at the moment, a much longer time before downloading, problems with cleaning files, etc. Etc.). Not to mention that this is just nasty.
I am looking for a library that will allow me to do something like ...
C = Compressor(outputstream) C.BeginFile('Data.csv') for D in Api.StreamResults(): C.Write(D) C.CloseFile() C.Close()
In other words, something that will write the output stream when writing data.
I managed to do this in .Net and PHP, but I have no idea how to approach it in Python.
To imagine things in perspective, by "lots" of data, I mean that I need to be able to process up to ~ 10 GB (unprocessed plaintext) of data. This is part of the export / dump process for a large data system.
Basic source share