Calculating the size of a directory file - how to do it faster?

Using C #, I find the total size of the directory. The logic is: Get the files inside the folder. Summarize the total size. Find if there are subdirectories. Then do a recursive search.

I tried another way to do this: Using FSO ( obj.GetFolder(path).Size ). In both of these approaches there is not much time difference.

Now the problem is that I have tens of thousands of files in a specific folder, and at least 2 minutes to find the size of the folder. In addition, if I run the program again, this will happen very quickly (5 seconds). I think windows cache file sizes.

Is there a way to reduce the time it takes to launch a program for the first time?

+19
c # windows winapi winforms filesystemobject
Jun 05 '10 at 6:34
source share
7 answers

If you play a little with her, trying to calculate her, and itโ€™s amazing - she accelerated here in my car (up to three times on an ATV), I donโ€™t know if she really is in all cases, but give her a try ...

.NET4.0 code (or use 3.5 with TaskParallelLibrary)

  private static long DirSize(string sourceDir, bool recurse) { long size = 0; string[] fileEntries = Directory.GetFiles(sourceDir); foreach (string fileName in fileEntries) { Interlocked.Add(ref size, (new FileInfo(fileName)).Length); } if (recurse) { string[] subdirEntries = Directory.GetDirectories(sourceDir); Parallel.For<long>(0, subdirEntries.Length, () => 0, (i, loop, subtotal) => { if ((File.GetAttributes(subdirEntries[i]) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint) { subtotal += DirSize(subdirEntries[i], true); return subtotal; } return 0; }, (x) => Interlocked.Add(ref size, x) ); } return size; } 
+34
Jun 05 '10 at 17:02
source share

Hard drives - an interesting beast - sequential access (for example, reading a large adjacent file) - super-zippy, picture 80 megabytes / sec. however, random access is very slow. this is what you encounter - recursing into folders does not read a lot (in terms of quantity) of data, but it will require a lot of random reads. The reason you see zippy perf the second is because the MFT is still in RAM (you are true when thinking of caching)

The best mechanism I've seen for this is to scan the MFT myself. The idea is that you read and analyze the MFT in one linear pass, building the information you need along the way. The end result will be much closer to 15 seconds on HD, which is very full.

some good reading: NTFSInfo.exe - http://technet.microsoft.com/en-us/sysinternals/bb897424.aspx Windows Internals - http://www.amazon.com/Windows%C2%AE-Internals- Including-Windows-PRO-Developer / dp / 0735625301 / ref = sr_1_1 i.e. = UTF8 &? s = books & QID = 1277085832 & cf = 8-1

FWIW: this method is very complicated, since in Windows (or any OS that I know about) there really is no great way to do this - the problem is that finding out which folders / files are needed requires a lot of head movement on the disk. It would be very difficult for Microsoft to create a general solution to the problem you are describing.

+10
Jun 21 '10 at 2:05 a.m.
source share

The short answer is no. The way that Windows could make calculating the directory size faster is to update the directory size and all the parent directory sizes for each file entry. However, this will make the file write a slower operation. Since you have to write files more often than read directory sizes, this is a reasonable compromise.

I'm not sure what kind of problem is being solved, but if this is file system monitoring, it might be worth checking out: http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx

+7
Jun 05 2018-10-06T00:
source share

I donโ€™t think this will change much, but it can go a little faster if you use the FindFirstFile and NextFile API functions for this.

I donโ€™t think there is a really quick way to do this. For comparison, you can try making dir /a /x /s > dirlist.txt and listing the directory in Windows Explorer to see how fast they are, but I think they will look like FindFirstFile .

PInvoke contains an example using the API.

+1
Jun 05 2018-10-06T00:
source share

When scanning a folder with tens of thousands of files, the efficiency will be using any method .

  • Using FindFirstFile ... and FindNextFile ... Windows functions provide quick access.

  • Due to overhead crashes, even if you use the Windows API features, performance will not increase. The structure already wraps these API functions, so it makes no sense to do it yourself.

  • How you process the results for any file access method determines the performance of your application. For example, even if you use the functions of the Windows API, updating the list may result in loss of performance.

  • You cannot compare execution speed with Windows Explorer. In my experiments, I find that in many cases, Windows Explorer reads directly from a file allocation table.

  • I know that the fastest access to the file system is the DIR command. You cannot compare performance with this command. It is definitely read directly from the file allocation table (possibly using the BIOS).

  • Yes, the operating system caches file access.

suggestions

  • I wonder if BackupRead will help in your case?

  • What if you go to DIR and capture and then parse its output? (You really don't understand, because each DIR line has a fixed width, so this is just a matter of calling a substring.)

  • What if you use the DIR /B > NULL shell in the background thread and then run your program? While DIR is running, you will gain access to the cached file.

+1
Jun 16 2018-10-06T00:
source share

With tens of thousands of files, you cannot win with a frontal attack. You need to try to be more creative with the solution. With so many files, you might even find that during the time you took to calculate the size, the files were changed and your data is already erroneous.

So, you need to move the load to another place. For me, the answer will be to use System.IO.FileSystemWatcher and write code that controls the directory and updates the index.

Writing a Windows service may take only a short time, which can be configured to control a set of directories and write the results to a common output file. You can force the service to recount the file sizes at startup, but then just keep track of the changes whenever the Create / Delete / Changed event is fired using System.IO.FileSystemWatcher . The advantage of catalog monitoring is that you are only interested in small changes, which means that your numbers have a higher chance of being correct (remember that all data is out of date!)

Then the only thing you need to pay attention to is that you will have several resources trying to access the resulting output file. So just make sure you consider this.

0
Jun 17 '10 at 11:06 on
source share

I abandoned the .NET implementation (for performance reasons) and used the Native function GetFileAttributesEx (...)

Try the following:

 [StructLayout(LayoutKind.Sequential)] public struct WIN32_FILE_ATTRIBUTE_DATA { public uint fileAttributes; public System.Runtime.InteropServices.ComTypes.FILETIME creationTime; public System.Runtime.InteropServices.ComTypes.FILETIME lastAccessTime; public System.Runtime.InteropServices.ComTypes.FILETIME lastWriteTime; public uint fileSizeHigh; public uint fileSizeLow; } public enum GET_FILEEX_INFO_LEVELS { GetFileExInfoStandard, GetFileExMaxInfoLevel } public class NativeMethods { [DllImport("KERNEL32.dll", CharSet = CharSet.Auto)] public static extern bool GetFileAttributesEx(string path, GET_FILEEX_INFO_LEVELS level, out WIN32_FILE_ATTRIBUTE_DATA data); } 

Now just do the following:

 WIN32_FILE_ATTRIBUTE_DATA data; if(NativeMethods.GetFileAttributesEx("[your path]", GET_FILEEX_INFO_LEVELS.GetFileExInfoStandard, out data)) { long size = (data.fileSizeHigh << 32) & data.fileSizeLow; } 
0
Jun 21 '10 at 10:05
source share



All Articles