How to list objects by extension from s3 api?

Question

How to list objects by extension from s3 api?

Is there any way to search for objects in S3 by extension, and not just a prefix?

Here is what I have now:

ListObjectsResponse r = s3Client.ListObjects(new Amazon.S3.Model.ListObjectsRequest()
{
    BucketName = BucketName,
    Marker = marker,
    Prefix = folder, 
    MaxKeys = 1000
});

So, I need to list all * .xls files in my bucket.

+3

amazon-s3

st78 Jan 17 '11 at 0:31

source share

4 answers

You really don't need a separate database to do this for you.

S3 . , ".xls" , . , , , (: XLS-myfile.xls). S3 API "XLS".

+3

alfredaday 19 . '13 20:15

, , . python boto3, , .

, . , , "" / "", .

s3_client = boto3.client('s3')
bucket = 'my-bucket'
prefix = 'my-prefix/foo/bar'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)

file_names = []

for response in response_iterator:
    for object_data in response['Contents']:
        key = object_data['Key']
        if key.endswith('.json'):
            file_names.append(key)

print file_names

+3

nackjicholson 29 . '16 0:11

I repeat after receiving file information. The end result will be in dict

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket_name')

#get all files information from buket
files = bucket.objects.all()

# create empty list for final information
files_information = []

# your known extensions list. we will compare file names with this list
extensions = ['png', 'jpg', 'txt', 'docx']

# Iterate throgh 'files', convert to dict. and add extension key.
for file in files:
    if file.key[-3:] in extensions:
        files_information.append({'file_name' : file.key, 'extension' : file.key[-3:]})
    else:
        files_information.append({'file_name' : file.key, 'extension' : 'unknown'})


print files_information

+1

Tushar niras Apr 08 '17 at 5:29

source share

Geoff appleford · Accepted Answer · 2011-01-18T19:26:14+0000

I do not think this is possible with S3.

The best solution is to "index" S3 using a database (Sql Server, MySql, SimpleDB, etc.) and make your requests against this.

How to list objects by extension from s3 api?

More articles: