Bucket contents list with boto3

Question

Bucket contents list with boto3

How can I see what's inside the bucket in S3 using boto3 ? (i.e. a "ls" )?

Do the following:

 import boto3 s3 = boto3.resource('s3') my_bucket = s3.Bucket('some/path/')

returns:

 s3.Bucket(name='some/path/')

How can I see its contents?

+128

python amazon-s3 boto3 boto

Amelio Vazquez-Reina May 14 '15 at 11:22

source share

10 answers

This is similar to "ls", but it does not take into account the prefix folder agreement and lists objects in the bucket. He left the reader to filter out the prefixes that are part of the key name.

In Python 2:

 from boto.s3.connection import S3Connection conn = S3Connection() # assumes boto.cfg setup bucket = conn.get_bucket('bucket_name') for obj in bucket.get_all_keys(): print(obj.key)

In Python 3:

 from boto3 import client conn = client('s3') # again assumes boto.cfg setup, assume AWS S3 for key in conn.list_objects(Bucket='bucket_name')['Contents']: print(key['Key'])

+78

cgseller May 15 '15 at 14:45

source share

I assume that you have configured authentication separately.

 import boto3 s3 = boto3.resource('s3') my_bucket = s3.Bucket('bucket_name') for file in my_bucket.objects.all(): print(file.key)

+26

Tushar Niras Apr 05 '17 at 4:04 on

source share

If you want to pass ACCESS and SECRET keys (which should not be done because it is unsafe):

 from boto3.session import Session ACCESS_KEY='your_access_key' SECRET_KEY='your_secret_key' session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY) s3 = session.resource('s3') your_bucket = s3.Bucket('your_bucket') for s3_file in your_bucket.objects.all(): print(s3_file.key)

+22

Erwin Alberto Apr 7 '17 at 13:16

source share

To handle large lists of keys (for example, when the list of directories exceeds 1000 items), I used the following code to accumulate key values (i.e. file names) with multiple lists (thanks to Amelio above for the first lines). Code for python3:

  from boto3 import client bucket_name = "my_bucket" prefix = "my_key/sub_key/lots_o_files" s3_conn = client('s3') # type: BaseClient ## again assumes boto.cfg setup, assume AWS S3 s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter = "/") if 'Contents' not in s3_result: #print(s3_result) return [] file_list = [] for key in s3_result['Contents']: file_list.append(key['Key']) print(f"List count = {len(file_list)}") while s3_result['IsTruncated']: continuation_key = s3_result['NextContinuationToken'] s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter="/", ContinuationToken=continuation_key) for key in s3_result['Contents']: file_list.append(key['Key']) print(f"List count = {len(file_list)}") return file_list

+14

Hephaestus Nov 22 '18 at 1:17

source share

My s3 keys utility is essentially an optimized version of @Hephaestus answer:

 import boto3 s3_paginator = boto3.client('s3').get_paginator('list_objects_v2') def keys(bucket_name, prefix='/', delimiter='/', start_after=''): prefix = prefix[1:] if prefix.startswith(delimiter) else prefix start_after = (start_after or prefix) if prefix.endswith(delimiter) else start_after for page in s3_paginator.paginate(Bucket=bucket_name, Prefix=prefix, StartAfter=start_after): for content in page.get('Contents', ()): yield content['Key']

In my tests (boto3 1.9.84), this is significantly faster than the equivalent (but simpler) code:

 import boto3 def keys(bucket_name, prefix='/', delimiter='/'): prefix = prefix[1:] if prefix.startswith(delimiter) else prefix bucket = boto3.resource('s3').Bucket(bucket_name) return (_.key for _ in bucket.objects.filter(Prefix=prefix))

Since S3 guarantees UTF-8 binary sort results , the start_after optimization start_after been added to the first function.

+7

Sean Summers Jan 03 '19 at 0:19

source share

In a more economical way, instead of iterating over the for loop, you can also just print the source object containing all the files inside your S3 bucket:

 session = Session(aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key) s3 = session.resource('s3') bucket = s3.Bucket('bucket_name') files_in_s3 = bucket.objects.all() #you can print this iterable with print(list(files_in_s3))

+5

Daniel Vieira Jun 24 '17 at 7:14

source share

ObjectSummary:

There are two identifiers that are attached to ObjectSummary:

bucket_name
key

boto3 S3: ObjectSummary

For more information on object keys, see the AWS S3 documentation:

Object keys:
When you create an object, you specify a key name that uniquely identifies the object in the basket. For example, in the Amazon S3 console (see AWS Management Console), when you select a segment, a list of objects in it is displayed. These names are the keys of the object. The name for the key is a sequence of Unicode characters whose UTF-8 encoding has a length of not more than 1024 bytes.
The Amazon S3 data model is a flat structure: you create a container, and the container stores objects. There is no hierarchy of subfolders or subfolders; however, you can infer the logical hierarchy using prefixes and key separators, as the Amazon S3 console does. The Amazon S3 console supports the concept of folders. Suppose your bucket (created by the administrator) has four objects with the following object keys:
Development /Projects1.xls
Finance /statement1.pdf
Individuals /taxdocument.pdf
s3-dg.pdf
Link:
AWS S3: Object Keys

Here is an example code that demonstrates how to get the basket name and object key.

Example:

 import boto3 from pprint import pprint def main(): def enumerate_s3(): s3 = boto3.resource('s3') for bucket in s3.buckets.all(): print("Name: {}".format(bucket.name)) print("Creation Date: {}".format(bucket.creation_date)) for object in bucket.objects.all(): print("Object: {}".format(object)) print("Object bucket_name: {}".format(object.bucket_name)) print("Object key: {}".format(object.key)) enumerate_s3() if __name__ == '__main__': main()

+2

Gothburz Oct. 16 '18 at 19:26

source share

I just did it like this, including the authentication method:

 s3_client = boto3.client( 's3', aws_access_key_id='access_key', aws_secret_access_key='access_key_secret', config=boto3.session.Config(signature_version='s3v4'), region_name='region' ) response = s3_client.list_objects(Bucket='bucket_name', Prefix=key) if ('Contents' in response): # Object / key exists! return True else: # Object / key DOES NOT exist! return False

+1

Milean Sep 30 '18 at 19:21

source share

in case someone is looking for an updated answer - you should probably use v2 list now. from Boto 3 documents:

Returns some or all (up to 1000) of objects in a bucket. You can use query parameters as a selection criterion to return a subset of objects in a segment. Note. ListObjectsV2 is a redesigned List Objects API, and we recommend that you use this redesigned API to develop new applications.

 client = boto3.client('s3') #auth stuff the same as in the comments above client.list_objects_v2( Bucket="bucket" Prefix="prefix" )

for more information about many files in the list of segments plus the expected answer and such a check here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2

0

zszohar stiro Jun 25 '19 at 15:58

source share

garnaat · Accepted Answer · 2015-05-15 00:17

One way to view the content:

 for my_bucket_object in my_bucket.objects.all(): print(my_bucket_object)

Bucket contents list with boto3

More articles: