Haskell Analyzer AST Data Type Assignment

Question

Haskell Analyzer AST Data Type Assignment

I searched around interwebs for several days, trying to get an answer to my questions, and I finally admit defeat. I have been given a grammar:

Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Int ::= Dig | Dig Int Var ::= a | b | ... z | A | B | C | ... | Z Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

And I was told to parse, evaluate and print the expressions using this grammar
where the operators * + - have their normal meaning
The task is to write a function parse :: String -> AST

which takes a string as input and returns an abstract syntax tree when the input is in the correct format (which I can use).

I was told that I might need a suitable data type and that a data type might be needed for output from some other classes.

Next, I should consider writing a function tokens::String -> [String]
split the input string into a list of tokens
The analysis should be done using ast::[String] -> (AST,[String])
where the input is a list of tokens, and it outputs the AST, and to parse the subexpressions I should just use the ast function recursively.

I also have to make the printExpr method to print the result, so printE: AST -> String
printE(parse "* 5 5") gives either "5*5" or "(5*5)"
as well as a function for evaluating the expression evali :: AST -> Int

I just wanted to point in the right direction where I can start. I have little knowledge about Haskell and FP in general and trying to solve this problem, I made some string processing functions from Java that made me realize that I was fine. So, a small pointer in the right direction and perhaps an explanation of how AST should look like this: The third day in a row and the code still does not work, I really appreciate any attempt to help me find a solution! Thanks in advance! Edit

Perhaps I was unclear: I wonder how I should go from reading and tokenizing the input string to create an AST.

+4

parsing haskell abstract-syntax-tree

Mark renton Oct 3 '12 at 15:49

source share

3 answers

To start with Haskell itself, I would recommend Learn You a Haskell for Great Good! A very well written and entertaining guide. Real World Haskell is another recommended starting point.

Edit: A more fundamental introduction to parsing - Functional parsers . I wanted to replace failure with a success list from Philip Wadler. Unfortunately, it does not seem to be available on the Internet.

To start with parsing in Haskell, I think you should first read the monadic parsing in Haskell , and then maybe this wikibook example , but also the parsec manual .

Your grammar

 Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Int ::= Dig | Dig Int Var ::= a | b | ... z | A | B | C | ... | Z Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

offers several abstract data types:

 data Dig = Dig_0 | Dig_1 | Dig_2 | Dig_3 | Dig_4 | Dig_5 | Dig_6 | Dig_7 | Dig_8 | Dig_9 data Integ = I_Dig Dig | I_DigInt Dig Integ data Var = Var_a | Var_b | ... Var_z | Var_A | Var_B | Var_C | ... | Var_Z data Expr = Expr_I Integ | Expr_Neg Expr | Expr_Plus Expr Expr | Expr_Times Expr Expr Var | Expr_Var Var | Expr_let Var Expr Expr

This is essentially a recursively defined syntax tree, no need to create another. Sorry for the clumsy things Dig_ and Integ_ - they should start with capital letters.

(Personally, I would like to convert Integ to Int right away, so I would make newtype Integ = Integ Int and maybe make newtype Var = Var Char , but that may not suit you.)

As soon as you finish with the basic ones - dig and var , and neg_ , plus_ , in_ , etc. Go with an adaptive interface to create them, for example, the expr parser for expr will be something like

 expr = Expr_I <$> integ <|> Expr_Neg <$> neg_ *> expr <|> Expr_Plus <$> plus_ *> expr <*> expr <|> Expr_Times <$> times_ *> expr <*> expr <|> Expr_Var <$> var <|> Expr_let <$> let_ *> var <*> equals_ *> expr <*> in_ *> expr

So, almost all the time, your Haskell code is clean and very much like the grammar you gave.

+3

AndrewC Oct 3 '12 at 16:23

source share

Well, it looks like you are trying to build many, many things, and you are not quite sure where exactly all this is happening. I would suggest that the correct definition for AST and then trying to implement evali would be a good start.

The thunderbolt that you indicated is interesting ... It seems you want to enter * 5 5 , but output 5*5 , which is a strange choice. Is it really supposed to be a unary minus, not a binary? Cute, * Expr Expr Var looks, maybe you would like to introduce * Expr Expr | Var * Expr Expr | Var ...

In any case, making some assumptions about what you wanted to say, your AST will look something like this:

 data AST = Leaf Int | Sum AST AST | Minus AST | Var String | Let String AST AST

Now try to do printE . He takes an AST and gives us a string. According to the above definition, an AST should be one of five possible things. You just need to figure out what to print for everyone!

  printE :: AST -> String printE (Leaf x ) = show x printE (Sum xy) = printE x ++ " + " ++ printE y printE (Minus x ) = "-" ++ printE x ...

show turns a Int into a String . ++ concatenates two lines together. I will let you work through the rest of the function. (It's hard to say if you want it to print brackets in order to correctly show the order of the subexpressions ... Since your grammar doesn't mention brackets, I think not.)

Now what about evali ? Well, it will be a similar deal. If AST is Leaf x , then x is Int , and you just return it. If you have, say, Minus x , then x not an integer, it is an AST, so you need to turn it into an integer with evali . The function looks something like this:

 evali :: AST -> Int evali (Leaf x ) = x evali (Sum xy) = (evali x) + (evali y) evali (Minus x ) = 0 - (evali x) ...

It works great so far. But wait! It looks like you should use Let to define new variables and access them later with Var . Well, in that case you need to store these variables somewhere. And that will make the task more difficult.

My recommendation would be to use Data.Map to store a list of variable names and their corresponding values. You will need to add a variable map to the type signature. You can do it as follows:

 evali :: AST -> Int evali ast = evaluate Data.Map.empty ast evaluate :: Map String Int -> AST -> Int evaluate m ast = case ast of ...same as before... Let var ast1 ast2 -> evaluate (Data.Map.insert var (evaluate m ast1)) ast2 Var var -> m ! var

So, evali now just calls evaluate with an empty variable. When evaluate sees Let , it adds a variable to the map. And when he sees Var , he looks at the name on the map.

Regarding parsing a string in AST, this is another answer ...

+1

Mathematicalicalrchid Oct 3 '12 at 16:28

source share

AndrewC · Accepted Answer · 2012-10-04T02:27:34+0000

Parsing tokens into an abstract syntax tree

Ok take the grammar

 Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Int ::= Dig | Dig Int Var ::= a | b | ... z | A | B | C | ... | Z Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr

This is a nice easy grammar, because from the first sign you can say what kind of suppression it will be. (If there was something more complex, for example + , going between numbers, or - used to subtract as and also negation, you will need a trick with a list of successes explained in Functional Parsers .)

Suppose you have an example input:

 rawinput = "- 6 + 45 let x = - 5 in * xx"

What I understand from the grammar is "(- 6 (+ 45 (let x=-5 in (* xx))))" , and I’ll assume that you have designated it as

 tokenised_input' = ["-","6","+","4","5","let","x","=","-","5","in","*","x","x"]

which matches the grammar, but you may have received

 tokenised_input = ["-","6","+","45","let","x","=","-","5","in","*","x","x"]

which best suits your AST sample. I find it good practice to name your AST after a bit of your grammar, so I'm going to go and replace

 data AST = Leaf Int | Sum AST AST | Min AST | ...

with

 data Expr = E_Int Int | E_Neg Expr | E_Sum Expr Expr | E_Prod Expr Expr | E_Var Char | E_Let {letvar::Char,letequal:: Expr,letin::Expr} deriving Show

I called the E_Let bits to make it clearer what they represent.

Writing a parsing function

You can use isDigit by adding import Data.Char (isDigit) to help:

 expr :: [String] -> (Expr,[String]) expr [] = error "unexpected end of input" expr (s:ss) | all isDigit s = (E_Int (read s),ss) | s == "-" = let (e,ss') = expr ss in (E_Neg e,ss') | s == "+" = (E_Sum e e',ss'') where (e,ss') = expr ss (e',ss'') = expr ss' -- more cases

Hop! There are too many obstacles hiding the meaning, and we will write the same code for E_Prod and much worse for E_Let . Let it figure it out!

The standard way to deal with this is to write some combinators; instead of the tedious thread of input [String] through our definition, define ways to mess with the output of the parsers (map) and combine several parsers into one (elevator).

Clear code 1: map

First we need to define pmap , our own equivalent of the map function, so that we can make pmap E_Neg (expr1 ss) instead of let (e,ss') = expr1 ss in (E_Neg e,ss')

 pmap :: (a -> b) -> ([String] -> (a,[String])) -> ([String] -> (b,[String]))

nonono, I can’t even read it! We need a synonym like:

 type Parser a = [String] -> (a,[String]) pmap :: (a -> b) -> Parser a -> Parser b pmap fp = \ss -> let (a,ss') = p ss in (fa,ss')

But actually it would be better if I did

 data Parser a = Par [String] -> (a,[String])

so i can do

 instance Functor Parser where fmap f (Par p) = Par (pmap fp)

I will leave it for you to understand if you want.

Clear code 2: combining two parsers

We also need to deal with a situation where we have two parsers to run, and we want to combine their results using a function. This is called function cancellation for parsers.

 liftP2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c liftP2 f p1 p2 = \ss0 -> let (a,ss1) = p1 ss0 (b,ss2) = p2 ss1 in (fab,ss2)

or maybe even three parsers:

 liftP3 :: (a -> b -> c -> d) -> Parser a -> Parser b -> Parser c -> Parser d

I will let you think about how to do this. In the let statement, you'll need liftP5 to parse sections of the let statement, raising a function that ignores "=" and "in" . You can do

 equals_ :: Parser () equals_ [] = error "equals_: expected = but got end of input" equals_ ("=":ss) = ((),ss) equals_ (s:ss) = error $ "equals_: expected = but got "++s

and a couple more to help with this.

Actually, pmap can also be called liftP1 , but the map is the traditional name for this kind of thing.

Rewritten with good combinators

Now we are ready to clear expr :

 expr :: [String] -> (Expr,[String]) expr [] = error "unexpected end of input" expr (s:ss) | all isDigit s = (E_Int (read s),ss) | s == "-" = pmap E_Neg expr ss | s == "+" = liftP2 E_Sum expr expr ss -- more cases

Everything will be fine. Indeed, everything is in order. But liftP5 will be a little longer and messy.

Taking cleaning further is an ultra-pleasant applicative way

Applicative functors are the way to go. Remember that I suggested refactoring like

 data Parser a = Par [String] -> (a,[String])

so you can make it an instance of Functor ? Maybe you don’t want to, because all you got is the new fmap name for the perfectly working pmap and you have to deal with all of these Par constructors cluttering your code. Perhaps this will make you rethink, though; we can import Control.Applicative , then using the data declaration we can define <*> , which is then and uses <$> instead of pmap , with *> value <*>-but-forget-the-result-of-the-left-hand-side so you write

 expr (s:ss) | s == "let" = E_Let <$> var *> equals_ <*> expr <*> in_ *> expr

Which is very similar to a grammar definition, so it's easy to write code that works for the first time. This is how I like to write parsers. Actually, this is how I like to write terrible things. You would only need to define fmap , <*> and pure , all simple and long repeating liftP3 , liftP4 , etc.

Read about applicative functors. They are wonderful.

Tips for creating an applicator Parser: pure does not change the list. <*> is similar to liftP2 , but the function does not come from outside, it is derived from p1 .

Haskell Analyzer AST Data Type Assignment

Parsing tokens into an abstract syntax tree

Writing a parsing function

Clear code 1: map

Clear code 2: combining two parsers

Rewritten with good combinators

Taking cleaning further is an ultra-pleasant applicative way

More articles: