What is the most efficient way to store 500,000 images?

I am coding a basic gallery for a site with approximately 40,000 online users at any given time. Users will be able to create galleries and upload images.

My question is: should I make a separate folder for each gallery and put the images in it, or create one folder and put all the images in it, but save the gallery for each image in the database? Or, should I create a directory for each user, and then another directory inside them for gallery names?

How do you do this?

Ps. I need it to be as light as it can.

+4
source share
3 answers

I would save them by id and I would divide them into folders (depending on the file system, some of them do not work with a large number of files in 1 folder), plus they are easier to find if you need to see something manually

Give each file an identifier, then using the first 3 digits of the file name, divide them into folders. (you can start the auto-increment counter at 100000 or zero pad id, so there are at least 3 levels

/photos/1/0/3/103456.jpg /photos/9/4/1/941000.jpg /photos/0/0/0/000001.jpg 

You can keep the photo associated with the user / gallery / etc in the database

Or if you want to see how big boys do it

A needle in a haystack: efficient storage of billions of photos

+8
source

Usually web servers do not want you to have more than several thousand images in one folder (recently I had to deal with 70,000 images causing super slow reads and sorts, so trust me with this), therefore, of course, not one folder if you think you will have thousands of images. I would suggest that the best solution would be to remove the Amazon S3 associated with their CDN CloudFront, but if this is unrealistic, you can still do a few things only on your own server.

Create a separate folder for each gallery, as you suggest, only if you know some boundaries of how big the gallery is, and you have an idea of ​​how many galleries will be created. (This is what I would suggest for your specific problem right now)

Put the image name through the hash function, then use the first 1-3 characters of the hash to name the folders for the images. The hash ensures that images are roughly equally divided between folders, and you can decide how many folders you need.

In any case, information about which gallery and image identifier in the actual path are likely to be useful for you, moving forward both in the code and whenever a person should look for errors on the server. I would probably name the folders based on the gallery ID and just make sure that the gallery does not have more than a few thousand images.

+3
source

I store mine as follows:

 images/userid/photoid 

This way, I can quickly isolate custom images if I need to check something later. This seems more organized than deleting them all in one central directory.

0
source

Source: https://habr.com/ru/post/1396956/


All Articles