Convert hierarchical data structure to flat in Haskell

Question

Convert hierarchical data structure to flat in Haskell

I am extracting some data from a text document organized as follows:

- "day 1" - "Person 1" - "Bill 1" - "Person 2" - "Bill 2"

I can read this into a list of tuples that looks like this:

 [(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]

If the first element of each tuple indicates the level of the header, and the second element indicates the information associated with each header.

My question is: how can I get a list of items that look like this:

 [["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]

those. one list for the deepest nested element containing all the information from the headers above it. The closest I got is:

 f [] = [] f (x:xs) = row:f rest where leaves = takeWhile (\i -> fst i > fst x) xs rest = dropWhile (\i -> fst i > fst x) xs row = concat $ map (\i -> (snd x):[snd i]) leaves

What gives me this:

 [[["day 1"],["Intro 1"],["day 1"],["Bill 1"],["day 1"],["Intro 2"],["day 1"],["Bill 2"]]]

I want the solution to work for any number of levels.

Ps I am new to Haskell. I have a point that I could / should use a tree to store data, but I cannot wrap it around. I also could not come up with a better name.

+4

list haskell tree

ajerneck Oct 21 '12 at 16:22

source share

2 answers

Trees

You were right that you should probably use a tree to store data. I will copy how Data.Tree does Data.Tree :

 data Tree a = Node a (Forest a) deriving (Show) type Forest a = [Tree a]

Building a tree

Now we want to take your weakly typed list of tuples and convert it to (slightly) stronger Tree from String s. Each time you need to convert a weakly typed value and check it before converting to a stronger type, you use Parser :

 type YourData = [(Int, [String])] type Parser a = YourData -> Maybe (a, YourData)

A type synonym for YourData is the weak type that you are processing. A variable of type a is the value that you extract from the analysis. Our Parser type returns Maybe , because Parser may fail. To understand why the following input does not match a valid Tree , since it lacks level 1 of the tree:

 [(0, ["val1"]), (2, ["val2"])]

If Parser succeeds, it also returns an unused input so that subsequent parsing steps can use it.

Now, oddly enough, the above type of Parser exactly matches the famous monad transformer stack:

 StateT s Maybe a

This can be seen if you deploy the base implementation of StateT :

 StateT s Maybe a ~ s -> Maybe (a, s)

This means that we can simply determine:

 import Control.Monad.Trans.State.Strict type Parser a = StateT [(Int, [String])] Maybe a

If we do this, we will get a copy of Monad , Applicative and Alternative for our Parser type for free. This makes it easy to define parsers!

First we need to define a primitive parser that consumes a single tree node:

 parseElement :: Int -> Parser String parseElement level = StateT $ \list -> case list of [] -> Nothing (level', strs):rest -> case strs of [str] -> if (level' == level) then Just (str, rest) else Nothing _ -> Nothing

This is the only non-trivial piece of code we need to write that, since it is complete, handles all of the following corner cases:

the list is empty
Your node has several meanings in it
The number in the tuple does not match the expected depth

In the next part, everything becomes very elegant. Then we can define two mutually recursive parsers: one for parsing a Tree and one for parsing a Forest :

 import Control.Applicative parseTree :: Int -> Parser (Tree String) parseTree level = Node <$> parseElement level <*> parseForest (level + 1) parseForest :: Int -> Parser (Forest String) parseForest level = many (parseTree level)

The first parser uses the Applicative style, as StateT provided us with an Applicative instance for free. However, instead, I could use an instance of StateT Monad to give code that is more readable for an imperative programmer:

 parseTree :: Int -> Parser (Tree String) parseTree level = do str <- parseElement level forest <- parseForest (level + 1) return $ Node str forest

But what about the many function? What does it do? Let's look at its type:

 many :: (Alternative f) => fa -> f [a]

What is required is that it returns a value and implements Applicative , and instead calls it again to return a list of values instead. When we defined our Parser type in State terms, we got an Alternative instance for free, so we can use the many function to convert what one Tree (i.e. parseTree ) parseTree into something that a Forest (i.e. parseForest ).

To use our Parser , we simply rename the existing StateT function to make its purpose understandable:

runParser :: Parser a → [(Int, [String])] → Perhaps runParser = evalStateT

Then we just run it!

 >>> runParser (parseForest 0) [(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])] Just [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]

This is just magic! Let's see what happens if we give it an invalid input:

 >>> runParser (parseForest 0) [(0, ["val1"]), (2, ["val2"])] Just [Node "val1" []]

He succeeds on the part of the entrance! We can actually indicate that it should consume the entire input by defining a parser that matches the end of the input:

 eof :: Parser () eof = StateT $ \list -> case list of [] -> Just ((), []) _ -> Nothing

Now try:

 >>> runParser (parseForest 0 >> eof) [(0, ["val1"]), (2, ["val2"])] Nothing

Perfect!

Tree smoothing

To answer your second question, we solve the problem again using mutually recursive functions:

 flattenForest :: Forest a -> [[a]] flattenForest forest = concatMap flattenTree forest flattenTree :: Tree a -> [[a]] flattenTree (Node a forest) = case forest of [] -> [[a]] _ -> map (a:) (flattenForest forest)

Give it a try!

 >>> flattenForest [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]] [["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]

Now, technically, I didn't have to use mutually recursive functions. I could do one recursive function. I just followed the Tree type definition from Data.Tree .

Conclusion

That way, theoretically, I could cut the code even further by skipping the intermediate Tree type and just parsing the smoothing result directly, but I decided that you could use the Tree view for other purposes.

The key selects the following items from it:

Explore Haskell abstractions to simplify your code.
Always record common functions
Learn to use recursion effectively.

If you do this, you will write reliable and elegant code that exactly matches this problem.

application

Here is the final code that includes everything I said:

 import Control.Applicative import Control.Monad.Trans.State.Strict import Data.Tree type YourType = [(Int, [String])] type Parser a = StateT [(Int, [String])] Maybe a runParser :: Parser a -> [(Int, [String])] -> Maybe a runParser = evalStateT parseElement :: Int -> Parser String parseElement level = StateT $ \list -> case list of [] -> Nothing (level', strs):rest -> case strs of [str] -> if (level' == level) then Just (str, rest) else Nothing _ -> Nothing parseTree :: Int -> Parser (Tree String) parseTree level = Node <$> parseElement level <*> parseForest (level + 1) parseForest :: Int -> Parser (Forest String) parseForest level = many (parseTree level) eof :: Parser () eof = StateT $ \list -> case list of [] -> Just ((), []) _ -> Nothing flattenForest :: Forest a -> [[a]] flattenForest forest = concatMap flattenTree forest flattenTree :: Tree a -> [[a]] flattenTree (Node a forest) = case forest of [] -> [[a]] _ -> map (a:) (flattenForest forest)

+5

Gabriel gonzalez Oct 21 '12 at 19:51

source share

Karolis Juodelė · Accepted Answer · 2012-10-21T17:15:11+0000

I think I solved it.

 group :: [(Integer, [String])] -> [[String]] group ((n, str):ls) = let (children, rest) = span (\(m, _) -> m > n) ls subgroups = map (str ++) $ group children in if null children then [str] ++ group rest else subgroups ++ group rest group [] = []

I have not tested this very much.

The idea is to notice a recursive pattern. This function takes the first item (N, S) of the list and then collects all the entries at higher levels until another item at level N goes to the children list. If there are no children, we are at the upper level, and S forms the result. If they are, S is added to all of them.

As for why your algorithm is not working, the problem is mainly in row . Please note that you are not returning recursively.

You can use trees.

 data Tree a = Node a [Tree a] deriving Show listToTree :: [(Integer, [String])] -> [Tree [String]] listToTree ((n, str):ls) = let (children, rest) = span (\(m, _) -> m > n) ls subtrees = listToTree children in Node str subtrees : listToTree rest listToTree [] = [] treeToList :: [Tree [String]] -> [[String]] treeToList (Node s ns:ts) = children ++ treeToList ts where children = if null ns then [s] else map (s++) (treeToList ns) treeToList [] = []

The algorithm is essentially the same. The first half goes to the first function, the second to the second.

Convert hierarchical data structure to flat in Haskell

Trees

Building a tree

Tree smoothing

Conclusion

application

More articles: