Effectively extract and filter files

This earlier SO question talks about how to get all the files in a directory tree that match one of several extensions.

eg. Extract all the files in C: \ and all the subdirectories corresponding to * .log, * .txt, * .dat.

The accepted answer was as follows:

var files = Directory.GetFiles("C:\\path", "*.*", SearchOption.AllDirectories)
            .Where(s => s.EndsWith(".mp3") || s.EndsWith(".jpg"));

It seems to me completely ineffective. If you searched in a directory tree containing thousands of files (it uses SearchOption.AllDirectories), each individual file in the specified directory tree is loaded into memory, and only then inconsistencies are removed. (Reminds me of the "paging" offered by ASP.NET datagrids.)

Unfortunately, the standard System.IO.DirectoryInfo.GetFiles method accepts only one filter at a time.

It may be just my lack of Linq knowledge, is it really inefficient in the way I mention?

Secondly, is there a more efficient way to do this with or without Linq (without resorting to multiple GetFiles calls)?

+3
source share
4 answers

I shared your problem and I found a solution in Matthew Podwysocki a great post at codebetter.com .

He implemented the solution using his own methods that allow you to provide a predicate in his implementation of GetFiles. In addition, he implemented his solution using yield statements that effectively reduce memory usage per file to an absolute minimum.

:

var allowedExtensions = new HashSet<string> { ".jpg", ".mp3" };

var files = GetFiles(
    "C:\\path", 
    SearchOption.AllDirectories, 
    fn => allowedExtensions.Contains(Path.GetExtension(fn))
);

, ( ).

+2

. , , . , , . , , - , / , .

+1

# yield?

EDIT: , , , .

class Program
{
    static string PATH = "F:\\users\\llopez\\media\\photos";

    static Func<string, bool> WHERE = s => s.EndsWith(".CR2") || s.EndsWith(".html");

    static void Main(string[] args)
    {
        using (new Profiler())
        {
            var accepted = Directory.GetFiles(PATH, "*.*", SearchOption.AllDirectories)
                .Where(WHERE);

            foreach (string f in accepted) { }
        }

        using (new Profiler())
        {
            var files = traverse(PATH, WHERE);

            foreach (string f in files) { }
        }

        Console.ReadLine();
    }

    static IEnumerable<string> traverse(string path, Func<string, bool> filter)
    {
        foreach (string f in Directory.GetFiles(path).Where(filter))
        {
            yield return f;
        }

        foreach (string d in Directory.GetDirectories(path))
        {
            foreach (string f in traverse(d, filter))
            {
                yield return f;
            }
        }
    }
}

class Profiler : IDisposable
{
    private Stopwatch stopwatch;

    public Profiler()
    {
        this.stopwatch = new Stopwatch();
        this.stopwatch.Start();
    }

    public void Dispose()
    {
        stopwatch.Stop();
        Console.WriteLine("Runing time: {0}ms", this.stopwatch.ElapsedMilliseconds);
        Console.WriteLine("GC.GetTotalMemory(false): {0}", GC.GetTotalMemory(false));
    }
}

, GC.GetTotalMemory , (100K).

Runing time: 605ms
GC.GetTotalMemory(false): 3444684
Runing time: 577ms
GC.GetTotalMemory(false): 3293368
+1

GetFiles , , , , . , .

, , - GetFiles , , , .

+1

Source: https://habr.com/ru/post/1703093/


All Articles