Functional approach for parsing hierarchical CSV

Question

Functional approach for parsing hierarchical CSV

I am trying to create a piece of code but cannot make it work. The simplest example I can come up with is parsing some CSV files. Suppose we have a CVS file, but the data is organized in some sort of hierarchy. Like this:

Section1; ;Section1.1 ;Section1.2 ;Section1.3 Section2; ;Section2.1 ;Section2.2 ;Section2.3 ;Section2.4

and etc.

I have done this:

 let input = "a; ;a1 ;a2 ;a3 b; ;b1 ;b2 ;b3 ;b4 ;b5 c; ;c1" let lines = input.Split('\n') let data = lines |> Array.map (fun l -> l.Split(';')) let sections = data |> Array.mapi (fun il -> (i, l.[0])) |> Array.filter (fun (i, s) -> s <> "")

and i got

 val sections : (int * string) [] = [|(0, "a"); (4, "b"); (10, "c")|]

Now I would like to create a list of row index ranges for each section, something like this:

 [|(1, 3, "a"); (5, 9, "b"); (11, 11, "c")|]

moreover, the first number is the index of the starting line for the range of the subsection, and the second is the index of the ending line. How can I do it? I was thinking about using the fold function, but could not create anything.

+4

functional-programming f #

Max Feb 25 '10 at 23:02

source share

3 answers

In general, when you only work with arrays, you force yourself to use mutable and imperative style code. I created a generic Array.splitBy function to group different sections. If you are planning to write your own parser, I suggest using lists and other high-level constructors.

 module Question open System let splitArrayBy f (array:_[]) = [| let i = ref 0 let start = ref 0 let last = ref [||] while !i < array.Length do if f array.[!i] then yield !last, array.[!start .. !i - 1] last := array.[!i] start := !i + 1 i := !i + 1 if !start <> !i then yield !last, array.[!start .. !i - 1] |] let input = "a;\n;a1\n;a2\n;a3\nb;\n;b1\n;b2\n;b3\n;b4\n;b5\nc;\n;c1" let lines = input.Split('\n') let data = lines |> Array.map (fun l -> l.Split(';')) let result = data |> splitArrayBy (fun s -> s.[0] <> "") Array.iter (printfn "%A") result

Print the following.

 ([||], [||]) ([|"a"; ""|], [|[|""; "a1"|]; [|""; "a2"|]; [|""; "a3"|]|]) ([|"b"; ""|], [|[|""; "b1"|]; [|""; "b2"|]; [|""; "b3"|]; [|""; "b4"|]; [|""; "b5"|]|]) ([|"c"; ""|], [|[|""; "c1"|]|])

Below is a small modification from the above to get an example output.

 let splitArrayBy f (array:_[][]) = [| let i = ref 0 let start = ref 0 let last = ref "" while !i < array.Length do if f array.[!i] then if !i <> 0 then yield !start, !i - 1, !last last := array.[!i].[0] start := !i + 1 i := !i + 1 if !start <> !i then yield !start, !i - 1, !last |] let input = "a;\n;a1\n;a2\n;a3\nb;\n;b1\n;b2\n;b3\n;b4\n;b5\nc;\n;c1" let lines = input.Split('\n') let data = lines |> Array.map (fun l -> l.Split(';')) let result = data |> splitArrayBy (fun s -> s.[0] <> "") (printfn "%A") result

Exit

 [|(1, 3, "a"); (5, 9, "b"); (11, 11, "c")|]

+1

gradbot Feb 26 '10 at 0:02

source share

JSON structure will be perfect for you; analyzers and converters are already available.

read about it here: http://msdn.microsoft.com/en-us/library/bb299886.aspx

edit: for some reason I saw j #, maybe it still applies in f # ..

0

Sean.c Feb 25 '10 at 23:09

source share

Tomas petricek · Accepted Answer · 2010-02-26T00:10:22+0000

As far as I know, there is no easy way to do this, but it is certainly a good way to practice functional programming skills. If you used some hierarchical representation of the data (for example, XML or JSON), the situation would be much simpler because you would not have to convert the data structure from linear (for example, list / array) to hierarchical (in this case, a list of lists).

In any case, a good way to approach the problem is to understand that you need to perform a more general data operation - you need to group adjacent elements of the array, starting a new group when you find a row with a value in the first column.

I'll start by adding the line number to the array, and then convert it to a list (which is usually easier to work with in F #):

 let data = lines |> Array.mapi (fun il -> i, l.Split(';')) |> List.ofSeq

Now we can write a reusable function that groups adjacent list items and starts a new group every time the specified predicate f returns true :

 let adjacentGroups f list = // Utility function that accumulates the elements of the current // group in 'current' and stores all groups in 'all'. The parameter // 'list' is the remainder of the list to be processed let rec adjacentGroupsUtil current all list = match list with // Finished processing - return all groups | [] -> List.rev (current::all) // Start a new group, add current to the list | x::xs when f(x) -> adjacentGroupsUtil [x] (current::all) xs // Add element to the current group | x::xs -> adjacentGroupsUtil (x::current) all xs // Call utility function, drop all empty groups and // reverse elements of each group (because they are // collected in a reversed order) adjacentGroupsUtil [] [] list |> List.filter (fun l -> l <> []) |> List.map List.rev

Now implementing your particular algorithm is relatively simple. First, we need to group the elements, starting a new group every time the first column has a value:

 let groups = data |> adjacentGroups (fun (ln, cells) -> cells.[0] <> "")

In the second step, we need to do some processing for each group. We take its first element (and select the name of the group), and then find the minimum and maximum number of lines among the remaining elements:

 groups |> List.map (fun ((_, firstCols)::lines) -> let lineNums = lines |> List.map fst firstCols.[0], List.min lineNums, List.max lineNums )

Please note that pattern matching in a lambda function will give a warning, but we can safely ignore this, because the group will always be non-empty.

Summary:. This answer shows that if you want to write elegant code, you can implement your higher-order reuse function (e.g. adjacentGroups ), since not everything is available in the main F # libraries. If you use function lists, you can implement it with recursion (for arrays you should use imperative programming, as in gradbot answer). When you have a good set of reusable features, most problems are easy :-).

Functional approach for parsing hierarchical CSV

More articles: