Getting data stream from a zip file sitting in an S3 bucket using boto3 lib and AWS Lambda

I am trying to create a serverless processor for my chron job. In this assignment, I get an archived file in my S3 bucket from one of my clients, the file size is about 50MB, but as soon as you unzip it, it becomes 1.5GBlarger and there is a hard limit on the space available on AWS Lambda, which 500MB, due to which I can’t download this file from the S3 bucket and unzip it on my lambda, I was able to unzip the file and stream the contents line by line from S3 using funzipan unix script.

for x in $files ; do echo -n "$x: " ; timeout 5 aws s3 cp $monkeydir/$x - | funzip

My bucket name: MonkeyBusiness Key: /Daily/Business/Banana/{current-date} Object:banana.zip

but now, since I am trying to achieve the same output using boto3, how can I transfer the compressed content to sys i / o and unzip the stream, save the contents in separate files divided by 10,000 lines, and upload the fragmented files back to S3. I need to be guided, as I'm pretty new to AWS and boto3.

Please let me know if you need more information about the job.

The solution below is not applicable here, because the zlib documentation clearly states that the lib is compatible for the gzip file format, and my question is about the zip file format.

import zlib

def stream_gzip_decompress(stream):
    dec = zlib.decompressobj(32 + zlib.MAX_WBITS)  # offset 32 to skip the header
    for chunk in stream:
        rv = dec.decompress(chunk)
        if rv:
            yield rv 
+4
source share
2 answers

, BytesIO , zip , , .

import io
import zipfile
import boto3
import sys

s3 = boto3.resource('s3', 'us-east-1')


def stream_zip_file():
    count = 0
    obj = s3.Object(
        bucket_name='MonkeyBusiness',
        key='/Daily/Business/Banana/{current-date}/banana.zip'
    )
    buffer = io.BytesIO(obj.get()["Body"].read())
    print (buffer)
    z = zipfile.ZipFile(buffer)
    foo2 = z.open(z.infolist()[0])
    print(sys.getsizeof(foo2))
    line_counter = 0
    for _ in foo2:
        line_counter += 1
    print (line_counter)
    z.close()


if __name__ == '__main__':
    stream_zip_file()
+2

. .

-, gzip , . boto3 S3 put_object() upload_fileobj, , .

.

stream = cStringIO.StringIO()
stream.write(s3_data)
stream.seek(0)
blocksize = 1 << 16  #64kb
with gzip.GzipFile(fileobj=stream) as decompressor:
    while True:
        boto3.client.upload_fileobj(decompressor.read(blocksize), "bucket", "key")

, , . , ByteIo upload_fileobj. .

ASAP, - lambda SQS. "" , SPOT ( ), .

0

Source: https://habr.com/ru/post/1685098/


All Articles