Amazon s3 file splitting best practices

I hope a simple question is apologies if it has already been answered, but nothing happened in the search.

On S3, is it better to organize images into smaller subdirectories or just keep them all in one directory? In a typical file system, you could use the image space in directories to improve performance. A flat structure with thousands of images in the same directory usually doesn't work fine. Does this apply to Amazon S3?

I can put all user images in the user folder, all mail images in the message folder, etc. OR I can put user images in folders such as users / {userId} so as not to have thousands of images in one user folder.

+9
source share
4 answers

Update 2018-10

You no longer need to consider performance when designing a partitioning scheme for your use case, see My InfoQ Summary. Amazon S3 improves query performance and reduces randomized prefix requirements for more information:

Amazon Web Services (AWS) recently announced a significant increase in S3 request performance and the ability to parallelize requests to scale to desired throughput. In particular, this performance improvement also "removes all previous instructions for randomizing object prefixes" and allows you to use "logical or sequential naming patterns in naming S3 objects without any performance implications."

Update 2013-09

The information in the link, although still pretty accurate, has been replaced by a newer document, S3 Request Rate and Performance Performance .


Initial answer

This is also a problem with Amazon S3, although only because of significant storage requirements, for detailed answers, see Amazon S3 performance tips and tricks, including object space partitioning strategies.

+7
source

It is worth considering a scheme to break it into files ... if not for any other reason, but simply with the ability to filter your files if you want to manually look around.

But do not waste too much time if you are sure that all you need is to usually access your files ... You can always switch to a new scheme later.

0
source

I apologize for the answer now, hoping this might be helpful,

AWS key names determine in which section the object (file) is stored - to improve performance, you can add the hax prefix to the file name.

GET-intensive workloads: use CloudFront

Mixed workloads (GET, PUT & DELETE): Use the hax prefix for S3 object key names to prevent multiple objects from being stored in the same partition.

0
source

Previous answers are already outdated https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/ "This S3 query speed performance improvement removes any "the previous guide to randomizing object prefixes to achieve better performance. This means that you can now use logical or sequential naming patterns in naming S3 objects without any performance implications."

0
source

Source: https://habr.com/ru/post/910410/


All Articles