How to store millions of images about 2 thousand in size

We are creating an ASP.Net MVC site that will have to store 1 million + images, about 2k-5k in size. From the previous ressearch, it looks like the file server is probably better than db (feel free to comment otherwise).

Is there anything special that needs to be considered when storing this large number of files? Are there any problems with the fact that Windows can quickly find a photo if there are so many files in one folder? Do I need to create a segmented directory structure, for example, separating them by file name? It would be nice if the solution were scaled to at least 10 million snapshots for possible future expansion needs.

+4
source share
5 answers

4Kb is the default cluster size for NTFS. You can customize these settings depending on the normal image size. http://support.microsoft.com/kb/314878

I would build a tree with subdirectories to be able to move from one FS to another: How many files can I put in a directory? and avoid some problems: http://www.frank4dd.com/howto/various/maxfiles-per-dir.htm

You may also have archives containing related images to download with only one open file. Thoses archives can be compressed, the bottleneck is I / O, uncompressed if it is CPU.

A database is easier to maintain, but slower ... so it's up to you!

+5
source

See also this server failure question for a discussion of the directory structure.

+3
source

The problem is not that the file system cannot store so many files in a directory, but if you want to access this directory using Windows Explorer, it will take a lot of time, so if you ever need to access this folder manually you should segment it, for example, with a catalog for every 2-3 first letters / numbers of the name or even a deeper structure.

If you can share it in 1k folders with 1k files, there will be more than enough of them, and the code for this is quite simple.

+2
source

Assuming NTFS, there is a limit of 4 billion files per volume (2 ^ 32 - 1). This is a common limit for all folders on a volume (including operating system files, etc.)

A large number of files in one folder should not be a problem; NTFS uses the B + tree for quick retrieval. Microsoft recommends disabling short file name creation (a function that allows you to extract mypictureofyou.html as mypic ~ 1.htm).

I do not know if there is a performance advantage for segmenting them into multiple directories; I assume that there would be no advantage because NTFS was designed for performance with large directories.

If you decide to segment them into several directories, use the hash function in the file name to get the directory name (and not the directory name, which is the first letter of the file name, for example), so each subdirectory has about the same number of files.

+1
source

I would not rule out the use of a content delivery network. They are designed to solve this problem. I have had great success with Amazon S3. Since you are using a Microsoft-based solution, Azure might be in good shape.

Is there any requirement that prevents you from using a third-party solution?

+1
source

Source: https://habr.com/ru/post/1305858/


All Articles