How to do multi-page download of large files from S3 in python?

I am looking for some code in Python that allows me to do multi-page loading of large files from S3. I found this github page , but it's too complicated with all the command line arguments passed in and the parser and other things that make it difficult for me to understand the code. I'm not looking for anything out of the ordinary and I want to get the basic code so that I can statically insert 2-3 file names into it and perform its multi-page download.

Can someone provide me with such a solution or a link to it? Or maybe help me clean up the code in the link I posted above?

+4
source share
1 answer

This is old, but here is what I did to get this to work:

conn.download_file(
    Bucket=bucket,
    Filename=key.split("/")[-1],
    Key=key,
    Config=boto3.s3.transfer.TransferConfig(
        max_concurrency=parallel_threads
    )
)

This is how I used this in good visual code:

import boto3
import math
import os
import time


def s3_get_meta_data(conn, bucket, key):
    meta_data = conn.head_object(
    Bucket=bucket,
    Key=key
)
return meta_data


def s3_download(conn, bucket, key, parallel_threads):
    start = time.time()
    md = s3_get_meta_data(conn, bucket, key)
    chunk = get_cunks(md["ContentLength"], parallel_threads)
    print("Making %s parallel s3 calls with a chunk size of %s each..." % (
        parallel_threads, convert_size(chunk))
    )
    cur_dir = os.path.dirname(os.path.realpath(__file__))
    conn.download_file(
        Bucket=bucket,
        Filename=key.split("/")[-1],
        Key=key,
        Config=boto3.s3.transfer.TransferConfig(
            max_concurrency=parallel_threads
        )
    )
    end = time.time() - start
    print("Finished downloading %s in %s seconds" % (key, end))


def convert_size(size_bytes):
    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])


def get_cunks(size_bytes, desired_sections):
    return size_bytes / desired_sections


session = boto3.Session(profile_name="my_profile")
conn = session.client("s3", region_name="us-west-2")

s3_download(
    conn,
    "my-bucket-name",
    "my/key/path.zip",
    5
)

Additional information can be specified in the Config parameter, read about it in the aws documentation:

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.TransferConfig

0
source

Source: https://habr.com/ru/post/1606861/


All Articles