SHA-1-based directory structure and NTFS restrictions?

I have an application that stores file-based data in an NTFS path directory that disables SHA-1 hash data. It has some really nice attributes (deduplication, immunity to changes in other metadata, etc.), but I'm curious how people who have experienced have created hash-based directory storage structures. My main problem is the number of files / folders that can be realistically saved at a given folder depth.

Does anyone know what restrictions I came across? If I were to dump them all into folders at the root of the storage path, I feel that I would greatly limit the storage's ability to grow. Although this will not be a problem soon, I would rather have a structure that avoids this than try to restructure the massive storage later.

If I took the approach to split the signature to create a deeper tree, are there any indications as to how much I will need to trim it? Would that be enough?

StringBuilder foo = new StringBuilder(60);
// ...root, etc.
// SHA-1 always has a length of 40, chunk it up to distribute into smaller groups
// "\0000\0000000000000000\00000000000000000000"
foo.Append(Path.DirectorySeparatorChar);
foo.Append(sha1, 0, 4);
foo.Append(Path.DirectorySeparatorChar);
foo.Append(sha1, 4, 16);
foo.Append(Path.DirectorySeparatorChar);
foo.Append(sha1, 20, 20);

Knowing that SHA-1 has a fairly decent distribution, I had to assume that there would eventually be large clusters, but on average it would be evenly distributed. These are the clusters I'm worried about.

, ? , Windows Explorer , #/System.IO?

+3
3

.

, NTFS , Windows . , , SHA-1 1 000 000 "".

Windows 4, (65536) . 3 ( 4096) 34 , . Windows .

:

const string Root = @"C:\_Sha1Buckets";
using (TextWriter writer = File.CreateText(@"C:\_Sha1Buckets.txt"))
{
    // simulate a very even distribution like SHA-1 would produce
    RandomNumberGenerator rand = RandomNumberGenerator.Create();
    byte[] sha1 = new byte[20];
    Stopwatch watch = Stopwatch.StartNew();

    for (int i=0; i<1000000; i++)
    {
        // populate bytes with a fake SHA-1
        rand.GetBytes(sha1);

        // format bytes into hex string
        string hash = FormatBytes(sha1);

        // C:\_Sha1Buckets
        StringBuilder builder = new StringBuilder(Root, 60);

        // \012\345\6789abcdef0123456789abcdef01234567\
        builder.Append(Path.DirectorySeparatorChar);
        builder.Append(hash, 0, 3);
        builder.Append(Path.DirectorySeparatorChar);
        builder.Append(hash, 3, 3);
        builder.Append(Path.DirectorySeparatorChar);
        builder.Append(hash, 6, 34);
        builder.Append(Path.DirectorySeparatorChar);

        Directory.CreateDirectory(builder.ToString());
        if (i % 5000 == 0)
        {
            // write out timings every five thousand files to see if changes
            writer.WriteLine("{0}: {1}", i, watch.Elapsed);
            Console.WriteLine("{0}: {1}", i, watch.Elapsed);
            watch.Reset();
            watch.Start();
        }
    }

    watch.Reset();
    Console.WriteLine("Press any key to delete the directory structure...");
    Console.ReadLine();
    watch.Start();
    Directory.Delete(Root, true);
    writer.WriteLine("Delete took {0}", watch.Elapsed);
    Console.WriteLine("Delete took {0}", watch.Elapsed);
}

(15-20 5000), . 30 !

, 1 :

  • 1- 4096 .
  • 2- 250 .
  • 3- 1

Windows , , . , , , . 4096. , , . 1 - .

- ?

+1

:

  • 4 10 . 4 65536 , 10 16 ^ 10 , , , ( ...)
  • , : ? . , , , ...

, , - . , 20 , 20 256 :

xx/xx/xx/xx/xx/...

, , 10 65536 :

xxxx/xxxx/xxxx/xxxx/xxxx/...

- , , , . , 256 ( 65536) .

+3

. , - SHA-1.

SHA-1, MD5, - , .

In any case, NTFS uses BTree's directory structures, so you can really put everything in one folder. Windows Explorer won't like it though.

+1
source

Source: https://habr.com/ru/post/1725635/


All Articles