The maximum number of files / directories in Linux?

I am developing a LAMP online store that will allow administrators to upload multiple images for each item.

My concern is that there will immediately be 20,000 items, which means approximately 60,000 images.

Questions:

  1. What is the maximum number of files and / or directories in Linux?

  2. How to deal with this situation (best practice)?

My idea was to create a directory for each item based on its unique identifier, but then I will still have 20,000 directories in the main download directory, and it will grow indefinitely, as old items will not be deleted.

Thanks for any help.

+47
linux directory folders directory-structure
Nov 23 '11 at 8:01
source share
6 answers

ext [234] file systems have a fixed maximum number of indexes; Each file or directory requires one inode. You can see the current counter and limits with df -i . For example, on an ext3 file system of 15 GB, created with default settings:

 Filesystem Inodes IUsed IFree IUse% Mounted on /dev/xvda 1933312 134815 1798497 7% / 

There are no restrictions on directories, in particular outside of this; keep in mind that each file or directory requires at least one file system block (typically 4 KB), although even if it is a directory with a single item in it.

As you can see, 80,000 inodes are unlikely to be a problem. And with the dir_index option (amenable to tune2fs ), searching in large directories is not too big. However, note that many administrative tools (such as ls or rm ) can hardly cope with directories with too many files in them. Therefore, it is recommended that you split your files so that you do not have more than a few hundred thousand items in any directory. An easy way to do this is to hash any identifier you use and use the first few hexadecimal digits as intermediate directories.

For example, let's say you have element ID 12345 and it hashes to 'DEADBEEF02842.......' . You can store files under /storage/root/d/e/12345 . Now you have reduced the number of files in each directory by 1 / 256th.

+74
Nov 23 '11 at 8:14
source share
— -

If your server file system has the dir_index function (see tune2fs(8) for more information on checking and enabling this function), you can intelligently store up to 100,000 files in a directory until performance degrades. ( dir_index been the default for new file systems for most distributions for several years, so it will be only the old file system, which by default has no function.)

However, adding another level of directories to reduce the number of files in a directory by 16 or 256 times would significantly improve the chances of things like ls * , working without exceeding the maximum argv kernel size.

Typically, this is done as follows:

 /a/a1111 /a/a1112 ... /b/b1111 ... /c/c6565 ... 

ie by adding a letter or number to the path, based on some function that you can calculate with the name. (The first two characters of the file name md5sum or sha1sum are one common approach, but if you have unique identifiers for the objects, then 'a'+ id % 16 is a fairly simple mechanism to determine which directory to use.)

+8
Nov 23 2018-11-11T00:
source share

60,000 nothing, 20,000. But you must put a group of these 20,000 by any means to speed up access to them. Perhaps in groups of 100 or 1000, taking the catalog number and dividing it by 100, 500, 1000, whatever.

For example, I have a project in which files have numbers. I group them into 1000, so I

 id/1/1332 id/3/3256 id/12/12334 id/350/350934 



In fact, you may have a hard limit - some systems have 32-bit inodes, so you are limited to 2 ^ 32 per file system.

+6
Nov 23 2018-11-11T00:
source share

In addition to the general answers (basically “don’t worry” and “configure your file system” and “organize your directory with subdirectories containing several thousand files each”):

If individual images are small (for example, less than a few kilobytes), instead of putting them in a folder, you can also put them in a database (for example, with MySQL as BLOB ) or, possibly, inside a GDBM indexed file. Then each small element will not consume an inode (in many file systems, each inode index wants at least a few kilobytes). You can also do this for a certain threshold (for example, put images larger than 4 KB in separate files and smaller ones in a database or GDBM file). Of course, be sure to back up your data (and determine your backup strategy).

+4
Nov 23 '11 at 9:59
source share

Year 2014 I come back in time to add this answer. Lots of big / small files? You can use Amazon S3 and other Ceph-based alternatives, such as DreamObjects, where there are no catalog restrictions to worry about.

Hope this helps someone solve all the alternatives.

+1
Mar 26 '14 at 1:51
source share
 md5($id) ==> 0123456789ABCDEF $file_path = items/012/345/678/9AB/CDE/F.jpg 1 node = 4096 subnodes (fast) 
-3
Apr 28 '14 at 3:28
source share



All Articles