Storage of a large number of images

I'm thinking of developing my own gallery based on PHP to store a large number of photos, possibly in tens of thousands.

In the database, I will point to the image URL, but here's the problem: I know that it is not advisable that they all sit in the same directory on the server, since this would slow down access to the crawl, how would you save them? Some kind of tree based on the name jpeg / png?

What rules for splitting images would you recommend to me?

(It will focus on use in cheapo dot coms, so itโ€™s not possible to manage the server)

+45
filesystems image tree
Jan 15 '09 at 10:57
source share
12 answers

We had a similar problem in the past. And found a good solution:

  • Give each image a unique index.
  • Create a database record for each image containing the name, location, manual and possible location of additional images (thumbnails, abbreviations, etc.).
  • Use the first (one or two) manual character to identify the top-level folder.
  • If there are too many files in folders, split them again. Update the links and you are ready to go.
  • If the number of files and hits is too large, you can distribute folders on different file servers.

We have experienced that with the help of directions you get a more or less uniform division. And it worked like a charm.

Links that can help generate a unique identifier:

+44
Jan 15 '09 at 11:18
source share
โ€” -

A few years ago I worked on an electronic document management system, and we did a lot of what Gamecat and wic suggested.

That is, assign a unique identifier to each image and use it to get the relative path to the image file. We used MOD in the same way as wic suggested, but we allowed 1024 folders / files at each level with 3 levels, so we could support 1G files.

However, we removed the extension from the files. Database records contained a MIME type, so no extension was required.

I would not recommend storing the full URL in the database entry, only the Image ID. If you are storing a URL, you cannot move or restructure the repository without converting your database. The relative URL will be approved, as you can at least move the image repository, but you will get more flexibility if you just save the identifier and get the URL.

In addition, I would not recommend allowing direct links to your image files from the Internet. Instead, specify the URL of the server program (for example, Java Servlet), and the image identifier will be specified in the URL request ( http://url.com/GetImage?imageID=1234 ).

A servlet can use this identifier to search for a record in the database, determine the MIME type, obtain the actual location, check security restrictions, logging, etc.

+10
Jan 15 '09 at 13:07
source share

I usually just use the number database identifier (auto_increment) and then use the modulu operator (%) to figure out where to put the file. Simple and scalable. For example, the path to the image with id 12345 can be created as follows:

 12345 % 100 = 45 12345 % 1000 = 345 

Ends:

 /home/joe/images/345/45/12345.png 

Or something like that.

If you use Linux and ext3 and the file system, you should be aware that there are limits on the number of directories and files that you can have in a directory. The limit is 32000 for dirs, so you should always strive to keep the number of drives low.

+8
Jan 15 '09 at 12:38
source share

I know that it is not advisable that they all sit in the same directory on the server as slow access to the crawl.

This is an assumption.

I developed systems in which we had millions of files stored in the same directory, and it worked perfectly. It is also the easiest programming system. Most server file systems support this without a problem (although you will need to check which one you are using).

http://www.databasesandlife.com/flat-directories/

+7
Aug 18 2018-10-18T00:
source share

When saving files associated with auto_increment ids, I use something like the following, which creates three directory levels, each of which consists of 1000 servers and 100 files in each third level directory. It supports ~ 100 billion files.

if $ id = 99532455444, then the following returns / 995/324/554/44

 function getFileDirectory($id) { $level1 = ($id / 100000000) % 100000000; $level2 = (($id - $level1 * 100000000) / 100000) % 100000; $level3 = (($id - ($level1 * 100000000) - ($level2 * 100000)) / 100) % 1000; $file = $id - (($level1 * 100000000) + ($level2 * 100000) + ($level3 * 100)); return '/' . sprintf("%03d", $level1) . '/' . sprintf("%03d", $level2) . '/' . sprintf("%03d", $level3) . '/' . $file; } 
+5
Jul 28 '10 at 19:50
source share

Check out the XFS file system. It supports an unlimited number of files, and Linux supports it. http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html

+2
Dec 10 '09 at 12:26
source share

You can have a DateTime column in the table, and then save them in folders named during the month, year, or even month, day and year, images added to the table.

Example

  • 2009
  • -01
  • - 01
  • - 02
  • - 03
  • - 31

this way you get no more than 3 folders in depth.

+1
Jan 15 '09 at 11:41
source share

I am currently facing this problem, and what Isaac wrote interested me. My function is a little different.

 function _getFilePath($id) { $id = sprintf("%06d", $id); $level = array(); for($lvl = 3; $lvl >= 1; $lvl--) $level[$lvl] = substr($id, (($lvl*2)-2), 2); return implode('/', array_reverse($level)).'.jpg'; } 

My images are only in thousands, so I only have a limit to 999999, and so he would split it into 99/99 / 99.jpg or 43524 by 04/35 / 24.jpg

+1
Mar 05 2018-11-11T00: 00Z
source share

Use the file system hierarchy. Identifying your images with something like 001/002/003 / 004.jpg would be very helpful. Separation is a completely different story. May be random, content based, creation date based, etc. Really depends on your application.

0
Jan 15 '09 at 11:02
source share

You can check out the Stratey used by the Apple iPod to store multimedia content. There are folders at the same depth level and files with names of the same width. I believe that Apple guys spent a lot of time testing their solution so that it could bring you instant benefits.

0
Jan 15 '09 at 11:05
source share

If the images you are processing are digital photographs, you can use EXIF โ€‹โ€‹data to sort them, for example, by recording date.

0
Jan 15 '09 at 11:30
source share

You can save images in the database as blobs ( varbinary for mssql). Thus, you do not need to worry about the structure of the repository or directory. The only drawback is that you cannot easily browse files, but in any case it will be difficult in a balanced directory tree.

0
Jan 15 '09 at 11:35
source share



All Articles