How to speed up MD5 checksum generation in vb.net?

I work with very large files located on P2 (Panasonic). Part of the process that we use is to first create the checksum of the file we are going to copy, then copy the file, and then run the checksum in the file to confirm that it copied OK. The problem is that the files are large (70 GB +) and are time consuming. This is a problem, because in the end we will be dealing with thousands of these files.

I would like to find a faster way to generate a checksum besides using System.Security.Cryptography.MD5CryptoServiceProvider I don't care if it means using a specialized hardware card provided that it works and it should not be incapable. I would rather have a coding method that provided some feedback on how far the process has progressed so that I can display it as it is now.

The application is written on vb.net. I would prefer to use it as a component, library, link in my application, but I am ready to call an external application if there is a sufficient improvement in the speed of generating the checksum.

Needless to say, the checksum must be consistent and correct .:-)

Thanks in advance for your time and efforts,

Richard

+4
source share
2 answers

I see one of the possible ways to speed up this process: calculate the MD5 of the source file , and execute a copy, not before it. This will reduce the number of times you need to read the entire file from 3 (hash of source code, copy, hash of destination) to 2 (copy, hash of destination).

The disadvantage of all this is that you have to write your own copy code (and not just rely on System.IO.File.Copy), and there is an unnecessary chance that this will turn out to be slower in the end than a three-step process.

Other than that, I don’t think that much can be done here, since the whole process is connected with the design I / O interface. You spend most of your time reading and writing a file, and even at a speed of 100 MB / s (a respectable I / O speed for your typical SATA drive) you will at best do about 5.8 GB / min.

With a modern processor, the overhead of computing MD5 (or anything else) doesn't really affect things, so speeding it up won't improve your overall throughput. Crypto accelerators, in particular, will not help you here, because if the driver implementation is very effective, they will add additional overhead due to the context switches needed to supply data to an external card, than they will save.

What you want to improve is I / O speed. The .NET framework is already quite effective when it comes to this (using good-sized buffers, overlapping I / O, etc.), but it is possible that an optimized application for native Windows will work better here. My tip: google around for a few of your own MD5 calculators and see how they compare to your current .NET implementation. If the difference in the speed of computing hashes is> 10%, you should switch to using an external application.

+2
source

The correct answer is to avoid using MD5. MD5 is a cryptographic hash function designed to provide certain cryptographic functions. It is too difficult and slow to simply detect accidental corruption. There are many faster checksums whose design can be understood by looking at the literature on detecting and correcting errors. Some common examples are CRC checksums, of which CRC32 is very common, but you can also relatively easily calculate 64 or 128 bits or even more CRC much faster than an MD5 hash.

+1
source

Source: https://habr.com/ru/post/1304335/


All Articles