Parsing a list of tokens in an expression tree

Question

Parsing a list of tokens in an expression tree

I want to parse expressions like those found in a typical Haskell source. I get an input stream that is already labeled and annotated with fixity and priority. Many operators are unknown at compile time and may be arbitrary. The result should be a tree representing the expression. Here is a bit of what I tried:

-- A single token of the input stream data Token a = Prefix a | Infix a Int Fixity -- The Int parameter represents the precedence | LBrace | RBrace deriving (Show,Eq) data Fixity = InfixL | InfixR | InfixC deriving (Show,Eq) data Expression a = Atom a | Apply aa deriving Show -- Wrapped into either, if expression is malformed. exprToTree :: [Token a] -> Either String (Expression a) exprToTree = undefined

For simplicity, I do not belong to special lambdas, they are just atoms.

But now I'm completely lost. How can I convert a stream of atoms to a tree? Can someone please indicate me an algorithm or help me find it.

+4

algorithm haskell expression-trees

fuz Mar 27 '11 at 15:17

source share

2 answers

After searching the Internet for another topic, I found this nice piece of code to do exactly what I want. Take a look:

 data Op = Op String Prec Fixity deriving Eq data Fixity = Leftfix | Rightfix | Nonfix deriving Eq data Exp = Var Var | OpApp Exp Op Exp deriving Eq type Prec = Int type Var = String data Tok = TVar Var | TOp Op parse :: [Tok] -> Exp parse (TVar x : rest) = fst (parse1 (Var x) (-1) Nonfix rest) parse1 :: Exp -> Int -> Fixity -> [Tok] -> (Exp, [Tok]) parse1 epf [] = (e, []) parse1 epf inp@ (TOp op@ (Op _ p' f') : TVar x : rest) | p' == p && (f /= f' || f == Nonfix) = error "ambiguous infix expression" | p' < p || p' == p && (f == Leftfix || f' == Nonfix) = (e, inp) | otherwise = let (r,rest') = parse1 (Var x) p' f' rest in parse1 (OpApp e op r) pf rest' -- Printing instance Show Exp where showsPrec _ (Var x) = showString x showsPrec p e@ (OpApp l (Op op _ _) r) = showParen (p > 0) $ showsPrec 9 l . showString op . showsPrec 9 r -- Testing plus = TOp (Op "+" 6 Leftfix) times = TOp (Op "*" 7 Leftfix) divide = TOp (Op "/" 7 Leftfix) gt = TOp (Op ">" 4 Nonfix) ex = TOp (Op "^" 8 Rightfix) lookupop '+' = plus lookupop '*' = times lookupop '/' = divide lookupop '>' = gt lookupop '^' = ex fromstr [x] = [TVar [x]] fromstr (x:y:z) = TVar [x] : lookupop y : fromstr z test1 = fromstr "a+b+c" test2 = fromstr "a+b+c*d" test3 = fromstr "a/b/c" test4 = fromstr "a/b+c" test5 = fromstr "a/b*c" test6 = fromstr "1^2^3+4" test7 = fromstr "a/1^2^3" test8 = fromstr "a*b/c"

(I took it from this page: http://hackage.haskell.org/trac/haskell-prime/attachment/wiki/FixityResolution/resolve.hs )

0

fuz Mar 29 '11 at 18:54

source share

stephen tetley · Accepted Answer · 2011-03-27T15:49:32+0000

In a nutshell, although you have a list of tokens, you still need a parser.

Parsec may handle alternative token streams, but you may have to refer to the manual - the PDF file available on Daan Leijen's "legacy" homepage is http://legacy.cs.uu.nl/daan/download/parsec/parsec. pdf . You can scroll your own parser without using the combinator library, but you will repurpose the Parsec part. As far as I remember, UU_parsing expects to work with a separate scanner, so its another option.

Although it does not handle parsing, you can find Lennart Augsson's “Lambda calculus, cooked in four ways”, useful for other things - http://www.augustsson.net/Darcs/Lambda/top.pdf

Change - here is a partially developed plan of how you can do this with Parsec, for more information you will need to refer to section 2.11 of the manual.

Suppose you have this data type for specific "internal" tokens:

 data InternalTok = Ident String | BinOpPlus | BinOpMinus | UnaryOpNegate | IntLiteral Int deriving (Show)

You will then get these types for the Parsec token and do the parsing:

 type MyToken = Token InternalTok type MyParser a = GenParser MyToken () a

Define the helper function according to the Parsec manual - this is the show and pos handle, so the individual definitions are shorter than cf. mytoken on page 19.

 mytoken :: (MyToken -> Maybe a) -> MyParser a mytoken test = token showToken posToken testToken where showToken tok = show tok posToken tok = no_pos testToken tok = test tok

At the moment, your token does not track the initial position, therefore:

 no_pos :: SourcePos no_pos = newPos "" 0 0 0

For each terminal, you must define a token function:

 identifier :: MyParser MyToken identifier = mytoken (\tok -> case tok of a@ (Prefix (Ident _)) -> Just a _ -> Nothing) intLiteral :: MyParser MyToken intLiteral = mytoken (\tok -> case tok of a@ (Prefix (IntLiteral _)) -> Just a _ -> Nothing) binPlus :: MyParser MyToken binPlus = mytoken (\tok -> case tok of a@ (Infix BinOpPlus _ _) -> Just a _ -> Nothing) binMinus :: MyParser MyToken binMinus = mytoken (\tok -> case tok of a@ (Infix BinOpMinus _ _) -> Just a _ -> Nothing) unaryNegate :: MyParser MyToken unaryNegate = mytoken (\tok -> case tok of a@ (Prefix UnaryNegate _ _) -> Just a _ -> Nothing)

Edit - to process custom infix operators, you will need these paired parsers:

 tokInfixL :: Int -> MyParser MyToken tokInfixL n = mytoken $ \tok -> case tok of a@ (Infix _ i InfixL) | i == n -> Just a _ -> Nothing) tokInfixR :: Int -> MyParser MyToken tokInfixR n = mytoken $ \tok -> case tok of a@ (Infix _ i InfixR) | i == n -> Just a _ -> Nothing) tokInfixC :: Int -> MyParser MyToken tokInfixC n = mytoken $ \tok -> case tok of a@ (Infix _ i InfixC) | i == n -> Just a _ -> Nothing) tokPrefix :: MyParser MyToken tokPrefix = mytoken (\tok -> case tok of a@ (Prefix _) -> Just a _ -> Nothing)

Now you can define a parser - you need to set the number of priority levels in advance, there is no way to get around this fact, since you need to encode a parser for each level.

Parsing top-level syntax just calls the highest priority parser

 pExpression :: Parser Expersion pExpression = expression10

For each level of preliminary analysis, you need a parser like this, you will have to work independently without participation. You may also need to do some work on chainl / chainr - I only wrote a parser in this style with UU_Parsing, this may be slightly different for Parsec. Note. It is used, as a rule, at the top priority level.

 expression10 :: Parser Expression expression10 = Apply <$> identifier <*> pExpression <|> Prefix <$> tokPrefix <*> pExpression <|> chainl (Infix <$> tokInfixL 10) expression9 <|> chainr (Infix <$> tokInfixR 10) expression9 expression9 :: Parser Expression expression9 = Prefix <$> tokPrefix <*> pExpression <|> chainl (Infix <$> tokInfixL 9) expression8 <|> chainr (Infix <$> tokInfixR 9) expression8 ...

You need to expand your syntax to handle IntLiterals and identifiers that are at level 0 in priority:

 expression0 :: Parser Expression expression0 = IntLit <$> intLiteral <|> Ident <$> identifier <|> ...

Change - for unlimited priority - perhaps if you only have the application and Atom, something like this might work. Note that you will have to modify the tokInfixL and tokInfixR partisans so that they no longer correspond to level-dependencies, and you may have to experiment with the order of alternatives.

 expression :: Parser Expression expression = Apply <$> identifier <*> expression <|> Prefix <$> tokPrefix <*> expression <|> chainl (Infix <$> tokInfixL) expression <|> chainr (Infix <$> tokInfixR) expression <|> intLiteral <|> identifier intLiteral :: Parser Expression intLiteral = Atom . convert <$> intLiteral where convert = ?? identifier :: Parser Expression identifier = Atom . convert <$> intLiteral where convert = ??

Parsing a list of tokens in an expression tree

More articles: