My s3 keys utility is essentially an optimized version of @Hephaestus answer:
import boto3 s3_paginator = boto3.client('s3').get_paginator('list_objects_v2') def keys(bucket_name, prefix='/', delimiter='/', start_after=''): prefix = prefix[1:] if prefix.startswith(delimiter) else prefix start_after = (start_after or prefix) if prefix.endswith(delimiter) else start_after for page in s3_paginator.paginate(Bucket=bucket_name, Prefix=prefix, StartAfter=start_after): for content in page.get('Contents', ()): yield content['Key']
In my tests (boto3 1.9.84), this is significantly faster than the equivalent (but simpler) code:
import boto3 def keys(bucket_name, prefix='/', delimiter='/'): prefix = prefix[1:] if prefix.startswith(delimiter) else prefix bucket = boto3.resource('s3').Bucket(bucket_name) return (_.key for _ in bucket.objects.filter(Prefix=prefix))
Since S3 guarantees UTF-8 binary sort results , the start_after optimization start_after been added to the first function.
Sean Summers Jan 03 '19 at 0:19 2019-01-03 00:19
source share