I have an application that stores file-based data in an NTFS path directory that disables SHA-1 hash data. It has some really nice attributes (deduplication, immunity to changes in other metadata, etc.), but I'm curious how people who have experienced have created hash-based directory storage structures. My main problem is the number of files / folders that can be realistically saved at a given folder depth.
Does anyone know what restrictions I came across? If I were to dump them all into folders at the root of the storage path, I feel that I would greatly limit the storage's ability to grow. Although this will not be a problem soon, I would rather have a structure that avoids this than try to restructure the massive storage later.
If I took the approach to split the signature to create a deeper tree, are there any indications as to how much I will need to trim it? Would that be enough?
StringBuilder foo = new StringBuilder(60);
foo.Append(Path.DirectorySeparatorChar);
foo.Append(sha1, 0, 4);
foo.Append(Path.DirectorySeparatorChar);
foo.Append(sha1, 4, 16);
foo.Append(Path.DirectorySeparatorChar);
foo.Append(sha1, 20, 20);
Knowing that SHA-1 has a fairly decent distribution, I had to assume that there would eventually be large clusters, but on average it would be evenly distributed. These are the clusters I'm worried about.
, ? , Windows Explorer , #/System.IO?