How to write a functional scanner file

First, let me apologize for the scale of this problem, but I'm really trying to think functionally, and this is one of the most difficult problems that I had to work with.

I wanted to get some suggestions on how I can deal with the problem that I have in a functional manner, especially in F #. I am writing a program to view a list of directories and use a list of regular expression patterns to filter out a list of files extracted from directories, and use a second list of regular expression patterns to find matches in the text of the resulting files. I want this thing to return the file name, row index, column index, pattern, and matching value for each piece of text that matches the given regular expression pattern. In addition, it is necessary to write exceptions and there are 3 possible scenarios for the exception: the directory cannot be opened, the file cannot be opened, the reading of the contents from the file failed. The ultimate requirement for this is that the volume of files โ€œscannedโ€ for matches can be very large, so it should all be lazy. I am not too worried about a โ€œcleanโ€ functional solution, as I am interested in a โ€œgoodโ€ solution that reads well and works well. One of the last tasks is to get it to interact with C #, because I would like to use winform tools to attach this algorithm to ui. Here is my first attempt and hopefully this will clear up the problem:

open System.Text.RegularExpressions open System.IO type Reader<'t, 'a> = 't -> 'a //=M['a], result varies let returnM x _ = x let map fm = fun t -> t |> m |> f let apply fm = fun t -> t |> m |> (t |> f) let bind fm = fun t -> t |> (t |> m |> f) let Scanner dirs = returnM dirs |> apply (fun dirExHandler -> Seq.collect (fun directory -> try Directory.GetFiles(directory, "*", SearchOption.AllDirectories) with | e -> dirExHandler e directory Array.empty)) |> map (fun filenames -> returnM filenames |> apply (fun (filenamepatterns, lineExHandler, fileExHandler) -> Seq.filter (fun filename -> filenamepatterns |> Seq.exists (fun pattern -> let regex = new Regex(pattern) regex.IsMatch(filename))) >> Seq.map (fun filename -> let fileinfo = new FileInfo(filename) try use reader = fileinfo.OpenText() Seq.unfold (fun ((reader : StreamReader), index) -> if not reader.EndOfStream then try let line = reader.ReadLine() Some((line, index), (reader, index + 1)) with | e -> lineExHandler e filename index None else None) (reader, 0) |> (fun lines -> (filename, lines)) with | e -> fileExHandler e filename (filename, Seq.empty)) >> (fun files -> returnM files |> apply (fun contentpatterns -> Seq.collect (fun file -> let filename, lines = file lines |> Seq.collect (fun line -> let content, index = line contentpatterns |> Seq.collect (fun pattern -> let regex = new Regex(pattern) regex.Matches(content) |> (Seq.cast<Match> >> Seq.map (fun contentmatch -> (filename, index, contentmatch.Index, pattern, contentmatch.Value)))))))))) 

Thanks for any input.

Updated - here is some updated solution based on the feedback received:

 open System.Text.RegularExpressions open System.IO type ScannerConfiguration = { FileNamePatterns : seq<string> ContentPatterns : seq<string> FileExceptionHandler : exn -> string -> unit LineExceptionHandler : exn -> string -> int -> unit DirectoryExceptionHandler : exn -> string -> unit } let scanner specifiedDirectories (configuration : ScannerConfiguration) = seq { let ToCachedRegexList = Seq.map (fun pattern -> new Regex(pattern)) >> Seq.cache let contentRegexes = configuration.ContentPatterns |> ToCachedRegexList let filenameRegexes = configuration.FileNamePatterns |> ToCachedRegexList let getLines exHandler reader = Seq.unfold (fun ((reader : StreamReader), index) -> if not reader.EndOfStream then try let line = reader.ReadLine() Some((line, index), (reader, index + 1)) with | e -> exHandler e index; None else None) (reader, 0) for specifiedDirectory in specifiedDirectories do let files = try Directory.GetFiles(specifiedDirectory, "*", SearchOption.AllDirectories) with e -> configuration.DirectoryExceptionHandler e specifiedDirectory; [||] for file in files do if filenameRegexes |> Seq.exists (fun (regex : Regex) -> regex.IsMatch(file)) then let lines = let fileinfo = new FileInfo(file) try use reader = fileinfo.OpenText() reader |> getLines (fun e index -> configuration.LineExceptionHandler e file index) with | e -> configuration.FileExceptionHandler e file; Seq.empty for line in lines do let content, index = line for contentregex in contentRegexes do for mmatch in content |> contentregex.Matches do yield (file, index, mmatch.Index, contentregex.ToString(), mmatch.Value) } 

Again, any input is welcome.

+6
source share
1 answer

I think the best approach is to start with the simplest solution and then expand it. Your current approach seems rather difficult to read for me for two reasons:

  • The code uses many combinators and compositions of functions that are not too common in F #. Some processing can be more easily written using sequence expressions.

  • The code is written as one function, but it is quite complex and would be more readable if it were divided into several functions.

I would probably start by breaking the code into a function that checks a single file (say fileMatches ) and a function that looks at files and calls fileMatches . The main iteration can be well written using F # sequence expressions:

 // Checks whether a file name matches a filename pattern // and a content matches a content pattern let fileMatches fileNamePatterns contentPatterns (fileExHandler, lineExHandler) file = // TODO: This can be imlemented using // File.ReadLines which returns a sequence // Iterates over all the files and calls 'fileMatches' let scanner specifiedDirectories fileNamePatterns contentPatterns (dirExHandler, fileExHandler, lineExHandler) = seq { // Iterate over all the specified directories for specifiedDir in specifiedDirectories do // Find all files in the directories (and handle exceptions) let files = try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories) with e -> dirExHandler e specifiedDir; [||] // Iterate over all files and report those that match for file in files do if fileMatches fileNamePatterns contentPatterns (fileExHandler, lineExHandler) file then // Matches! Return this file as part of the result. yield file } 

The function is still pretty complicated because you need to pass a lot of parameters. Acquiring parameters with a simple type or record might be a good idea:

 type ScannerArguments = { FileNamePatterns:string ContentPatterns:string FileExceptionHandler:exn -> string -> unit LineExceptionHandler:exn -> string -> unit DirectoryExceptionHandler:exn -> string -> unit } 

Then you can define both fileMatches and scanner as functions that take only two parameters, which will make your code more readable. Sort of:

 // Iterates over all the files and calls 'fileMatches' let scanner specifiedDirectories (args:ScannerArguments) = seq { for specifiedDir in specifiedDirectories do let files = try Directory.GetFiles(specifiedDir, "*", SearchOption.AllDirectories) with e -> args.DirectoryEceptionHandler e specifiedDir; [||] for file in files do // No need to propagate all arguments explicitly to other functions if fileMatches args file then yield file } 
+8
source

Source: https://habr.com/ru/post/905633/


All Articles