I am trying to create a serverless processor for my chron job. In this assignment, I get an archived file in my S3 bucket from one of my clients, the file size is about 50MB, but as soon as you unzip it, it becomes 1.5GBlarger and there is a hard limit on the space available on AWS Lambda, which 500MB, due to which I can’t download this file from the S3 bucket and unzip it on my lambda, I was able to unzip the file and stream the contents line by line from S3 using funzipan unix script.
for x in $files ; do echo -n "$x: " ; timeout 5 aws s3 cp $monkeydir/$x - | funzip
My bucket name: MonkeyBusiness
Key: /Daily/Business/Banana/{current-date}
Object:banana.zip
but now, since I am trying to achieve the same output using boto3, how can I transfer the compressed content to sys i / o and unzip the stream, save the contents in separate files divided by 10,000 lines, and upload the fragmented files back to S3. I need to be guided, as I'm pretty new to AWS and boto3.
Please let me know if you need more information about the job.
The solution below is not applicable here, because the zlib documentation clearly states that the lib is compatible for the gzip file format, and my question is about the zip file format.
import zlib
def stream_gzip_decompress(stream):
dec = zlib.decompressobj(32 + zlib.MAX_WBITS)
for chunk in stream:
rv = dec.decompress(chunk)
if rv:
yield rv
source
share