First, let me apologize for the scale of this problem, but I'm really trying to think functionally, and this is one of the most difficult problems that I had to work with.
I wanted to get some suggestions on how I can deal with the problem that I have in a functional manner, especially in F #. I am writing a program to view a list of directories and use a list of regular expression patterns to filter out a list of files extracted from directories, and use a second list of regular expression patterns to find matches in the text of the resulting files. I want this thing to return the file name, row index, column index, pattern, and matching value for each piece of text that matches the given regular expression pattern. In addition, it is necessary to write exceptions and there are 3 possible scenarios for the exception: the directory cannot be opened, the file cannot be opened, the reading of the contents from the file failed. The ultimate requirement for this is that the volume of files โscannedโ for matches can be very large, so it should all be lazy. I am not too worried about a โcleanโ functional solution, as I am interested in a โgoodโ solution that reads well and works well. One of the last tasks is to get it to interact with C #, because I would like to use winform tools to attach this algorithm to ui. Here is my first attempt and hopefully this will clear up the problem:
open System.Text.RegularExpressions open System.IO type Reader<'t, 'a> = 't -> 'a
Thanks for any input.
Updated - here is some updated solution based on the feedback received:
open System.Text.RegularExpressions open System.IO type ScannerConfiguration = { FileNamePatterns : seq<string> ContentPatterns : seq<string> FileExceptionHandler : exn -> string -> unit LineExceptionHandler : exn -> string -> int -> unit DirectoryExceptionHandler : exn -> string -> unit } let scanner specifiedDirectories (configuration : ScannerConfiguration) = seq { let ToCachedRegexList = Seq.map (fun pattern -> new Regex(pattern)) >> Seq.cache let contentRegexes = configuration.ContentPatterns |> ToCachedRegexList let filenameRegexes = configuration.FileNamePatterns |> ToCachedRegexList let getLines exHandler reader = Seq.unfold (fun ((reader : StreamReader), index) -> if not reader.EndOfStream then try let line = reader.ReadLine() Some((line, index), (reader, index + 1)) with | e -> exHandler e index; None else None) (reader, 0) for specifiedDirectory in specifiedDirectories do let files = try Directory.GetFiles(specifiedDirectory, "*", SearchOption.AllDirectories) with e -> configuration.DirectoryExceptionHandler e specifiedDirectory; [||] for file in files do if filenameRegexes |> Seq.exists (fun (regex : Regex) -> regex.IsMatch(file)) then let lines = let fileinfo = new FileInfo(file) try use reader = fileinfo.OpenText() reader |> getLines (fun e index -> configuration.LineExceptionHandler e file index) with | e -> configuration.FileExceptionHandler e file; Seq.empty for line in lines do let content, index = line for contentregex in contentRegexes do for mmatch in content |> contentregex.Matches do yield (file, index, mmatch.Index, contentregex.ToString(), mmatch.Value) }
Again, any input is welcome.