Left factorization in the grammar of a Parsing expression

I am trying to write a grammar for a language that allows the use of the following expressions:

  • Calling form functions f args (note: no parentheses!)
  • Adding (and more complex expressions, but itโ€™s not) form a + b

For instance:

 f 42 => f(42) 42 + b => (42 + b) f 42 + b => f(42 + b) 

The grammar is unambiguous (each expression can be analyzed in exactly one way), but I donโ€™t know how to write this grammar as a PEG, since both statements potentially start with the same token, id . This is my wrong PEG. How can I rewrite it to make it valid?

 expression ::= call / addition call ::= id addition* addition ::= unary ( ('+' unary) / ('-' unary) )* unary ::= primary / '(' ( ('+' unary) / ('-' unary) / expression) ')' primary ::= number / id number ::= [1-9]+ id ::= [az]+ 

Now that this grammar is trying to parse the input " a + b ", it parses " a " as a function call with zero arguments and chokes on " + b ".

Ive downloaded C ++ / Boost.Spirit.Qi grammar implementation in case someone wants to play with it.


(Note that unary removes the ambiguity of operations and additions: to call a function with a negative number, you need to specify parentheses as an argument, for example f (-1) .)

+4
source share
1 answer

As suggested in chat , you can start with something like:

 expression = addition | simple; addition = simple >> ( ('+' > expression) | ('-' > expression) ); simple = '(' > expression > ')' | call | unary | number; call = id >> *expression; unary = qi::char_("-+") > expression; // terminals id = qi::lexeme[+qi::char_("az")]; number = qi::double_; 

Since then, I implemented this in C ++ with an AST presentation so that you can understand how this grammar actually creates an expression tree by typing it pretty.

All source code is on github: https://gist.github.com/2152518

There are two versions (scroll down to "Alternative" to read more


Grammar:

 template <typename Iterator> struct mini_grammar : qi::grammar<Iterator, expression_t(), qi::space_type> { qi::rule<Iterator, std::string(), qi::space_type> id; qi::rule<Iterator, expression_t(), qi::space_type> addition, expression, simple; qi::rule<Iterator, number_t(), qi::space_type> number; qi::rule<Iterator, call_t(), qi::space_type> call; qi::rule<Iterator, unary_t(), qi::space_type> unary; mini_grammar() : mini_grammar::base_type(expression) { expression = addition | simple; addition = simple [ qi::_val = qi::_1 ] >> +( (qi::char_("+-") > simple) [ phx::bind(&append_term, qi::_val, qi::_1, qi::_2) ] ); simple = '(' > expression > ')' | call | unary | number; call = id >> *expression; unary = qi::char_("-+") > expression; // terminals id = qi::lexeme[+qi::char_("az")]; number = qi::double_; } }; 

Corresponding AST structures are detected quickly and dirty using the very powerful Boost option:

 struct addition_t; struct call_t; struct unary_t; typedef double number_t; typedef boost::variant< number_t, boost::recursive_wrapper<call_t>, boost::recursive_wrapper<unary_t>, boost::recursive_wrapper<addition_t> > expression_t; struct addition_t { expression_t lhs; char binop; expression_t rhs; }; struct call_t { std::string id; std::vector<expression_t> args; }; struct unary_t { char unop; expression_t operand; }; BOOST_FUSION_ADAPT_STRUCT(addition_t, (expression_t, lhs)(char,binop)(expression_t, rhs)); BOOST_FUSION_ADAPT_STRUCT(call_t, (std::string, id)(std::vector<expression_t>, args)); BOOST_FUSION_ADAPT_STRUCT(unary_t, (char, unop)(expression_t, operand)); 

In the full code, I also overloaded the <<operator for these structures.


Full demo

 //#define BOOST_SPIRIT_DEBUG #include <iostream> #include <iterator> #include <string> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/phoenix.hpp> #include <boost/fusion/adapted.hpp> #include <boost/optional.hpp> namespace qi = boost::spirit::qi; namespace phx= boost::phoenix; struct addition_t; struct call_t; struct unary_t; typedef double number_t; typedef boost::variant< number_t, boost::recursive_wrapper<call_t>, boost::recursive_wrapper<unary_t>, boost::recursive_wrapper<addition_t> > expression_t; struct addition_t { expression_t lhs; char binop; expression_t rhs; friend std::ostream& operator<<(std::ostream& os, const addition_t& a) { return os << "(" << a.lhs << ' ' << a.binop << ' ' << a.rhs << ")"; } }; struct call_t { std::string id; std::vector<expression_t> args; friend std::ostream& operator<<(std::ostream& os, const call_t& a) { os << a.id << "("; for (auto& e : a.args) os << e << ", "; return os << ")"; } }; struct unary_t { char unop; expression_t operand; friend std::ostream& operator<<(std::ostream& os, const unary_t& a) { return os << "(" << a.unop << ' ' << a.operand << ")"; } }; BOOST_FUSION_ADAPT_STRUCT(addition_t, (expression_t, lhs)(char,binop)(expression_t, rhs)); BOOST_FUSION_ADAPT_STRUCT(call_t, (std::string, id)(std::vector<expression_t>, args)); BOOST_FUSION_ADAPT_STRUCT(unary_t, (char, unop)(expression_t, operand)); void append_term(expression_t& lhs, char op, expression_t operand) { lhs = addition_t { lhs, op, operand }; } template <typename Iterator> struct mini_grammar : qi::grammar<Iterator, expression_t(), qi::space_type> { qi::rule<Iterator, std::string(), qi::space_type> id; qi::rule<Iterator, expression_t(), qi::space_type> addition, expression, simple; qi::rule<Iterator, number_t(), qi::space_type> number; qi::rule<Iterator, call_t(), qi::space_type> call; qi::rule<Iterator, unary_t(), qi::space_type> unary; mini_grammar() : mini_grammar::base_type(expression) { expression = addition | simple; addition = simple [ qi::_val = qi::_1 ] >> +( (qi::char_("+-") > simple) [ phx::bind(&append_term, qi::_val, qi::_1, qi::_2) ] ); simple = '(' > expression > ')' | call | unary | number; call = id >> *expression; unary = qi::char_("-+") > expression; // terminals id = qi::lexeme[+qi::char_("az")]; number = qi::double_; BOOST_SPIRIT_DEBUG_NODE(expression); BOOST_SPIRIT_DEBUG_NODE(call); BOOST_SPIRIT_DEBUG_NODE(addition); BOOST_SPIRIT_DEBUG_NODE(simple); BOOST_SPIRIT_DEBUG_NODE(unary); BOOST_SPIRIT_DEBUG_NODE(id); BOOST_SPIRIT_DEBUG_NODE(number); } }; std::string read_input(std::istream& stream) { return std::string( std::istreambuf_iterator<char>(stream), std::istreambuf_iterator<char>()); } int main() { std::cin.unsetf(std::ios::skipws); std::string const code = read_input(std::cin); auto begin = code.begin(); auto end = code.end(); try { mini_grammar<decltype(end)> grammar; qi::space_type space; std::vector<expression_t> script; bool ok = qi::phrase_parse(begin, end, *(grammar > ';'), space, script); if (begin!=end) std::cerr << "Unparsed: '" << std::string(begin,end) << "'\n"; std::cout << std::boolalpha << "Success: " << ok << "\n"; if (ok) { for (auto& expr : script) std::cout << "AST: " << expr << '\n'; } } catch (qi::expectation_failure<decltype(end)> const& ex) { std::cout << "Failure; parsing stopped after \"" << std::string(ex.first, ex.last) << "\"\n"; } } 

Alternative:

I have an alternative version that builds addition_t iteratively instead of recursively, so to speak:

 struct term_t { char binop; expression_t rhs; }; struct addition_t { expression_t lhs; std::vector<term_t> terms; }; 

This eliminates the need to use Phoenix to create an expression:

  addition = simple >> +term; term = qi::char_("+-") > simple; 
+3
source

Source: https://habr.com/ru/post/1402099/


All Articles