Why randomize your cloud storage / CDN file names?

Question

Why randomize your cloud storage / CDN file names?

When you look at a profile picture on a social networking site such as Twitter, they store image files, for example:

http://a1.twimg.com/profile_images/1082228637/a-smile_twitter_100.jpg

or even with a date somewhere in the way, like 20110912. The only immediate benefit that I can think of is preventing the bot from going through and downloading all the files in your storage in a linear way. Are there any other benefits? What is the best way to randomize?

I use Amazon S3, so I will have one subdomain serving all of my static content. My plan was to store the integer id in my database, and then just concat the url with the id to form the location.

+6

cloud amazon-s3 cdn cloud-storage

Adam Oct 9 '11 at 16:43

source share

3 answers

Changing URLs is a safe way to revoke obsolete assets.

This is also necessary if you want to allow users to store private images. Using the path subtracted from the user account name / id / path will make the privacy settings unusable once you save the assets to the CDN.

+4

rbq Oct 10 '11 at 21:19

source share

This basically prevents name conflicts. For example, more than one person may download "IMG_0001.JPG". You also avoid restrictions on the number of files in a single directory, and you can circle images on multiple servers - no site like Twitter or Facebook can store all photos on a single server, no matter how large they are.

+2

ceejayoz Oct 9 '11 at 16:50

source share

Eric hammond · Accepted Answer · 2011-10-09T17:36:31+0000

One of the reasons I cryptographically scramble identifiers in public URLs is because business growth is not always public.

If the current identifiers can be displayed simply by creating a new user account or uploading an image, then the external user can calculate the growth rate (or upper limit) by doing this on a regular basis and seeing how many identifiers have been used over the past time.

Regardless of whether it is stagnant or it explodes exponentially, I want to be able to control the release of this information, and not allow competitors or business analysts to bring it out for themselves.

Stand-alone examples of this are account numbers and checks. If you regularly pay or pay a company, then you can see how many bills or checks they write in this period of time.

Here's the CPAN (Perl) module, I support that scrambles 32-bit identifiers using SkipJack-based two-way encryption:

http://metacpan.org/pod/Crypt::Skip32

This is a direct translation of the Skip32 algorithm, written in C by Greg Rose:

http://www.qualcomm.com.au/PublicationsDocs/skip32.c

Using this approach maps each 32-bit identifier to a (effectively random) corresponding 32-bit number, which can be canceled back to the original identifier. You do not need to save anything in your database.

I will convert the scrambled identifier to 8 hexadecimal digits for display in urls.

Once your IDs reach 4.29 billion (32 bits), you will need to plan on expanding the URL structure to support more, but I like to have shorter URLs for as long as possible.

Why randomize your cloud storage / CDN file names?

More articles: