How to reduce CPU usage when scanning folders / subfolders / files?

Question

How to reduce CPU usage when scanning folders / subfolders / files?

I developed an application that scans mostly everywhere for a file or list of files. When scanning small folders, such as 10,000 files and subfiles, there is no problem. But when I scan, for example, my entire folder with over 100,000 items, it is very hard for my processor. It takes about 40% of my processor power.

Is there a way to optimize this code so that it uses less CPU?

procedure GetAllSubFolders(sPath: String); var Path: String; Rec: TSearchRec; begin try Path := IncludeTrailingBackslash(sPath); if FindFirst(Path + '*.*', faAnyFile, Rec) = 0 then try repeat Application.ProcessMessages; if (Rec.Name <> '.') and (Rec.Name <> '..') then begin if (ExtractFileExt(Path + Rec.Name) <> '') And (ExtractFileExt(Path + Rec.Name).ToLower <> '.lnk') And (Directoryexists(Path + Rec.Name + '\') = False) then begin if (Pos(Path + Rec.Name, main.Memo1.Lines.Text) = 0) then begin main.ListBox1.Items.Add(Path + Rec.Name); main.Memo1.Lines.Add(Path + Rec.Name) end; end; GetAllSubFolders(Path + Rec.Name); end; until FindNext(Rec) <> 0; finally FindClose(Rec); end; except on e: Exception do ShowMessage(e.Message); end; end;

My application searches for all files in the selected folder and subfolder, fastens them and copies them to another location you specify.

The Application.ProcessMessages team should make sure that the application does not look like it is hanging, and the user closes it. Because searching for 100,000 files, for example, can take about an hour ...

I'm worried about the use of the processor, it does not affect the memory.

Note. The memorandum is to make sure that the same files are not selected twice.

+6

delphi

Marcel Aug 14 '14 at 7:01

source share

2 answers

I agree with the answer, which says that it’s best to do what you do in the background thread, and I don’t want you to insist on this in your main thread.

However, if you go to the command line and do the following:

 dir c:\*.* /s > dump.txt & notepad dump.txt

You may be surprised how quickly Notepad appears.

So, you can speed up your GetAllSubFolders a bit, even if you save them in your main thread, for example. copy the code by calling main.Memo1.Lines.BeginUpdate and main.Memo1.Lines.EndUpdate, also main.Listbox1.Items.BeginUpdate and EndUpdate. This will stop updating these controls at runtime (which actually means that your code spends most of the time on this, and on “if Pos (...)”, which I commented on below). And, if you haven't gathered yet, Application.ProcessMessages is evil (mostly).

I did some timings on my D: drive, which is a 500 GB SSD with 263562 files in 35949 directories.

Code in your q: 6777 sec
Writing to a notebook as described above: 15 seconds
The code below, in the main thread: 9.7 seconds.

The reason I included the code below in this answer is because you find it much easier to execute on the stream, because it collects the results in a TStringlist, the contents of which you can then assign to your note and the list is complete.

A few comments on the code in your q, which I think you might get from somewhere.

It is pointlessly recursive, even if the current record in Rec is a simple file. The code below only repeats if the current Rec record is a directory.
It looks like he is trying to avoid duplicates using the “if Pos (...)” business, which should not be necessary (except, maybe if there is a symbolic link (for example, created using the MkLink command) somewhere that indicates elsewhere on the disk) and makes it very inefficient, i.e. by searching for the file name in the contents of the memo - they will become longer and longer, as it finds more files). In the code below, the string list is configured to remove duplicates and has the Sorted property set to True, which makes it check for duplicates much faster, because it can perform a binary search through its contents, not the serial one.
It computes Path + Rec.Name 6 times for each thing found, which is probably inefficient with r / t and inflates the source code. This is only a minor point, although compared with the first two.

code:

 function GetAllSubFolders(sPath: String) : TStringList; procedure GetAllSubFoldersInner(sPath : String); var Path, AFileName, Ext: String; Rec: TSearchRec; Done: Boolean; begin Path := IncludeTrailingBackslash(sPath); if FindFirst(Path + '*.*', faAnyFile, Rec) = 0 then begin Done := False; while not Done do begin if (Rec.Name <> '.') and (Rec.Name <> '..') then begin AFileName := Path + Rec.Name; Ext := ExtractFileExt(AFileName).ToLower; if not ((Rec.Attr and faDirectory) = faDirectory) then begin Result.Add(AFileName) end else begin GetAllSubFoldersInner(AFileName); end; end; Done := FindNext(Rec) <> 0; end; FindClose(Rec); end; end; begin Result := TStringList.Create; Result.BeginUpdate; Result.Sorted := True; Result.Duplicates := dupIgnore; // don't add duplicate filenames to the list GetAllSubFoldersInner(sPath); Result.EndUpdate; end; procedure TMain.Button1Click(Sender: TObject); var T1, T2 : Integer; TL : TStringList; begin T1 := GetTickCount; TL := GetAllSubfolders('D:\'); try Memo1.Lines.BeginUpdate; try Memo1.Lines.Text := TL.Text; finally Memo1.Lines.EndUpdate; end; T2 := GetTickCount; Caption := Format('GetAll: %d, Load: %d, Files: %d', [T2 - T1, GetTickCount - T2, TL.Count]); finally TL.Free; end; end;

+4

MartynA Aug 14 '14 at 8:26

source share

David heffernan · Accepted Answer · 2014-08-14T07:43:04+0000

I see the following performance issues:

Calling Application.ProcessMessages slightly more expensive. You are requesting messages, not expecting a block, i.e. GetMessage Like the performance issue, the use of Application.ProcessMessages usually indicates poor design for various reasons, and in the general case, you should avoid calling it.
A non-virtual list box does not work well with a large number of files.
Using a recording control (GUI control) to store a list of strings is extremely expensive.
Each time you add GUI controls, they update and update, which is very expensive.
Rating Memo1.Lines.Text unusually expensive.
Using Pos also significantly expensive.
Using DirectoryExists is expensive and false. The attributes returned in the search records contain this information.

I would make the following changes:

Move the search code to the stream to avoid the need for ProcessMessages . You will need to somehow transfer the information back to the main stream for display in the graphical interface.
Use the virtual list view to display files.
Save the list of files you want to find duplicates in the dictionary, which gives you an O(1) lookup. Take care of case insensitive file names, a problem you may still have ignored. This replaces the note.
Check if the item is a directory using Rec.Attr . This means that Rec.Attr and faDirectory <> 0 .

How to reduce CPU usage when scanning folders / subfolders / files?

More articles: