Improve grammar Q Numbers for synthesizing associative binary operators of AST nodes?

I am trying to parse expressions in the form "1.0 + 2.0 + 3.0 + ..." in AST. I have the following AST node for binary operations (full, minimal code sample at the end):

struct binop_t { expression_t lhs, rhs; }; 

I want to use the macro "BOOST_FUSION_ADAPT_STRUCT" so that this structure is synthesized using boost: spirit :: qi :: rule:

 BOOST_FUSION_ADAPT_STRUCT ( client::ast::binop_t, (client::ast::expression_t, lhs) (client::ast::expression_t, rhs) ) 

In other words, the binary operation AST node (binop_t) requires two operands - left (lhs) and right (rhs) expressions to work on. I can parse expressions like "1.0+ (2.0+ (3.0 + 4.0))" into this AST node using the following qi :: grammar:

 qi::rule<Iterator, ast::literal_t(), ascii::space_type> literal; qi::rule<Iterator, ast::binop_t(), ascii::space_type> binop; qi::rule<Iterator, ast::expression_t(), ascii::space_type> primary_expr; qi::rule<Iterator, ast::expression_t(), ascii::space_type> expr; expr = binop.alias(); binop = primary_expr > qi::lit('+') > primary_expr; primary_expr = (qi::lit('(') > expr > qi::lit(')')) | literal ; literal = qi::double_; 

However, I am struggling to figure out how to change this grammar so that it can parse such expressions without using parentheses (for example, "1 + 2 + 3 + 4 + ...").

I looked at the Boost Spirit example "calc4.cpp" and noticed that for binary operations (like add) only the AST node is used:

 struct operation { optoken operator_; operand operand_; }; 

The difference between this example and what I'm trying to do is that the example defines a grammar for synthesizing binary node operations purely in terms of a list of unary operations. A list of unary operations is synthesized in an AST node called a "program":

 struct program { operand first; std::list<operation> rest; }; 

All this is synthesized using the following grammar in the example:

  qi::rule<Iterator, ast::program(), ascii::space_type> expression; qi::rule<Iterator, ast::program(), ascii::space_type> term; qi::rule<Iterator, ast::operand(), ascii::space_type> factor; expression = term >> *( (char_('+') >> term) | (char_('-') >> term) ) ; term = factor >> *( (char_('*') >> factor) | (char_('/') >> factor) ) ; factor = uint_ | '(' >> expression >> ')' | (char_('-') >> factor) | (char_('+') >> factor) ; 

In this grammar, the rule “expression” creates a “program”, which is a list of operations. ”From the grammar rule for“ expression ”we can see that it uses the Klein star in the grammar:

 *((char_('+') >> term) 

Thus, the grammar can analyze chains of associative binary operations, such as "1 + 2 + 3 + 4 + ...". The attribute of this grammar is a list that matches the definition of the "program" AST node. The calculator function “eval” then simply iterates over the list of operations in the “program”, applying operations to operands from left to right:

  int operator()(program const& x) const { int state = boost::apply_visitor(*this, x.first); BOOST_FOREACH(operation const& oper, x.rest) { state = (*this)(oper, state); } return state; } 

I also looked at the Boost Spirit “mini-c” example, and it has a very similar AST design, where there is no binary AST node operator (only one node “operator” that accepts one operand).

Below is a complete, minimal code example for a program that I have implemented so far. Let me remind you that my question is: how can I change this program so that it can synthesize the binop_t AST node tree from an expression like "1 + 2 + 3 + 4 + ..." without using parentheses in the input text:

 #include <boost/variant.hpp> #include <boost/fusion/include/adapt_struct.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/phoenix_core.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <iostream> #include <string> #include <exception> using boost::variant; using boost::recursive_wrapper; namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; namespace phoenix = boost::phoenix; namespace client { namespace ast { struct literal_t; struct binop_t; typedef variant< recursive_wrapper<literal_t>, recursive_wrapper<binop_t> > expression_t; struct literal_t { double value; }; struct binop_t { expression_t lhs, rhs; }; }} // ns BOOST_FUSION_ADAPT_STRUCT ( client::ast::literal_t, (double, value) ) BOOST_FUSION_ADAPT_STRUCT ( client::ast::binop_t, (client::ast::expression_t, lhs) (client::ast::expression_t, rhs) ) namespace client { template <typename Iterator> struct grammar_t : qi::grammar<Iterator, ast::expression_t(), ascii::space_type> { qi::rule<Iterator, ast::literal_t(), ascii::space_type> literal; qi::rule<Iterator, ast::binop_t(), ascii::space_type> binop; qi::rule<Iterator, ast::expression_t(), ascii::space_type> primary_expr; qi::rule<Iterator, ast::expression_t(), ascii::space_type> expr; grammar_t() : grammar_t::base_type(expr) { expr = binop.alias(); binop = primary_expr > qi::lit('+') > primary_expr; primary_expr = (qi::lit('(') > expr > qi::lit(')')) | literal; literal = qi::double_; expr.name("expr"); binop.name("binop"); literal.name("literal"); qi::debug(expr); qi::debug(binop); qi::debug(literal); } }; } // ns int main() { try { string input = "0.1 + 1.2 "; std::string::const_iterator begin = input.begin(); std::string::const_iterator end = input.end(); typedef std::string::const_iterator iterator_type; client::grammar_t<iterator_type> g; client::ast::expression_t ast; bool status; status = qi::phrase_parse(begin, end, g, ascii::space, ast); EXPECT_TRUE(status); EXPECT_TRUE(begin == end); } catch (std::exception& e) { cout << e.what() << endl; } } 
+4
source share
1 answer

VeXocide on IRC channel ## IRC on freenode solved the problem ( http://codepad.org/wufmFufE ). The answer is to change the grammar as follows:

  expr = binop.alias(); binop = primary_expr >> qi::lit('+') >> (binop | primary_expr); primary_expr = (qi::lit('(') >> expr >> qi::lit(')')) | literal; literal = qi::double_; 

This grammar creates the correct recursion, able to synthesize the parse tree that I am looking for.

Advice for those who have the same problem: Without debug instructions, Spirit Boost Spirit will call Seg Fault because if you provide it with a left recursive grammar. If you enable debugging statements, it prints an “infinte” amount of text, which tells you that something is wrong in the parser.

+3
source

Source: https://habr.com/ru/post/1482069/


All Articles