Python - list of files and folders in bucket

I play with the boto library to access the amazon s3 bucket. I am trying to list all the files and folders in a given folder in a bucket. I use this to get all files and folders:

 for key in bucket.list(): print key.name 

This gives me all the files and folders in the root, as well as subfolders in which there are files inside them, for example:

 root/ file1 file2 folder1/file3 folder1/file4 folder1/folder2/file5 folder1/folder2/file6 

How can I only list the contents of say folder1 , where it will say something like:

 files: file3 file4 folders: folder2 

I can navigate to the folder using

 for key in in bucket.list(prefix=path/to/folder/) 

but in this case, it lists the files in folder2 as the files of folder1 , because I'm trying to use string manipulations in the path to the bucket. I tried every script and it still breaks if there are longer paths and when folders have several files and folders (and there are more files in these folders). Is there a recursive way to solve this problem?

+6
source share
3 answers

S3 has no idea about the β€œfolders” you might think about. This is a one-level hierarchy where files are stored by key.

If you need to do a one-level listing inside a folder, you will have to limit the listing in your code. Something like if key.count('/')==1

+2
source

All the information is the other answers are correct, but since many people store objects with keys such as paths in S3, the API provides some tools to help you deal with them.

For example, in your case, if you want to list only the "subdirectories" of root without listing all of the objects listed below, follow these steps:

 for key in bucket.list(prefix='root/', delimiter='/'): print(key.name) 

which should make a conclusion:

 file1 file2 folder1/ 

Then you can:

 for key in bucket.list(prefix='root/folder1/', delimiter='/'): print(key.name) 

and get:

 file3 file4 folder2/ 

And so on. You can probably accomplish what you want with this approach.

+8
source

What was hard for me to understand in S3 is that it is just a key / value repository , not a disk or other file-based type that most people are familiar with. The fact that people treat keys as folders and values ​​in the form of files helps to give an initial confusion in working with it.

As a repository of keys / values, keys are just identifiers, not actual paths in a directory structure. This means that you do not need to create folders in front of their link, so you can just put the object in a bucket in a place like /path/to/my/object , without having to create a "directory" /path/to/my .

Since S3 is a repository of keys / values, the API for interacting with it is more dependent on the object and hash than on the file. This means that regardless of whether you use the Amazon native API or use boto, functions like s3.bucket.Bucket.list will list all objects in the bucket and possibly the prefix. If you specify the prefix / foo / bar, then everything with this prefix will be indicated, including /foo/bar/file , /foo/bar/blargh/file , /foo/bar/1/2/3/file /foo/bar/blargh/file , etc. .

So the short answer is that you will need to filter out the results that you do not want from your call, s3.bucket.Bucket.list , because functions like s3.bucket.Bucket.list , s3.bucket.Bucket .get_all_keys etc. all are designed to return all keys under the prefix that you specify as a filter.

+3
source

Source: https://habr.com/ru/post/982450/


All Articles