Boost Spirit Signals Successful analysis despite the token being incomplete

I have a very simple path design that I am trying to analyze using boost spirit.lex.

We have the following grammar:

token := [az]+ path := (token : path) | (token) 

So, we are just talking about lowercase ASCII strings.

I have three examples: "xyz", "abc: xyz", "abc: xyz:".

The first two should be considered valid. A third that has a hind gut should not be considered valid. Unfortunately, I recognize the three parsers as valid. Grammar should not allow an empty token, but, apparently, the spirit does just that. What am I missing to let go of the third?

In addition, if you read the code below, in the comments there is another version of the parser, which requires that all paths end in half-columns. I can get the appropriate behavior when I activate these lines (that is, the deviation is "abc: xyz :;"), but this is not quite what I want.

Does anyone have any ideas?

Thanks.

 #include <boost/config/warning_disable.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/lex_lexertl.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <iostream> #include <string> using namespace boost::spirit; using boost::phoenix::val; template<typename Lexer> struct PathTokens : boost::spirit::lex::lexer<Lexer> { PathTokens() { identifier = "[az]+"; separator = ":"; this->self.add (identifier) (separator) (';') ; } boost::spirit::lex::token_def<std::string> identifier, separator; }; template <typename Iterator> struct PathGrammar : boost::spirit::qi::grammar<Iterator> { template <typename TokenDef> PathGrammar(TokenDef const& tok) : PathGrammar::base_type(path) { using boost::spirit::_val; path = (token >> tok.separator >> path)[std::cerr << _1 << "\n"] | //(token >> ';')[std::cerr << _1 << "\n"] (token)[std::cerr << _1 << "\n"] ; token = (tok.identifier) [_val=_1] ; } boost::spirit::qi::rule<Iterator> path; boost::spirit::qi::rule<Iterator, std::string()> token; }; int main() { typedef std::string::iterator BaseIteratorType; typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::string> > TokenType; typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType; typedef PathTokens<LexerType>::iterator_type TokensIterator; typedef std::vector<std::string> Tests; Tests paths; paths.push_back("abc"); paths.push_back("abc:xyz"); paths.push_back("abc:xyz:"); /* paths.clear(); paths.push_back("abc;"); paths.push_back("abc:xyz;"); paths.push_back("abc:xyz:;"); */ for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter ) { std::string str = *iter; std::cerr << "*****" << str << "*****\n"; PathTokens<LexerType> tokens; PathGrammar<TokensIterator> grammar(tokens); BaseIteratorType first = str.begin(); BaseIteratorType last = str.end(); bool r = boost::spirit::lex::tokenize_and_parse(first, last, tokens, grammar); std::cerr << r << " " << (first==last) << "\n"; } } 
+4
source share
3 answers

The problem is the value of first and last after your call tokenize_and_parse . first==last checks to see if your string has been completely marked, you can do nothing about grammar. If you isolate such parsing, you will get the expected result:

  PathTokens<LexerType> tokens; PathGrammar<TokensIterator> grammar(tokens); BaseIteratorType first = str.begin(); BaseIteratorType last = str.end(); LexerType::iterator_type lexfirst = tokens.begin(first,last); LexerType::iterator_type lexlast = tokens.end(); bool r = parse(lexfirst, lexlast, grammar); std::cerr << r << " " << (lexfirst==lexlast) << "\n"; 
+3
source

In addition to what llonesmiz already said, here is a trick using qi::eoi , which I sometimes use:

 path = ( (token >> tok.separator >> path) [std::cerr << _1 << "\n"] | token [std::cerr << _1 << "\n"] ) >> eoi; 

This makes the grammar require eoi (end of input) at the end of a successful match. This leads to the desired result:

http://liveworkspace.org/code/23a7adb11889bbb2825097d7c553f71d

 *****abc***** abc 1 1 *****abc:xyz***** xyz abc 1 1 *****abc:xyz:***** xyz abc 0 1 
+5
source

Here is what I finally finished. He uses suggestions from @sehe and @llonesmiz. Notice the conversion to std :: wstring and the use of actions in the grammar definition that were not present in the original message.

 #include <boost/config/warning_disable.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/lex_lexertl.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <boost/bind.hpp> #include <iostream> #include <string> // // This example uses boost spirit to parse a simple // colon-delimited grammar. // // The grammar we want to recognize is: // identifier := [az]+ // separator = : // path= (identifier separator path) | identifier // // From the boost spirit perspective this example shows // a few things I found hard to come by when building my // first parser. // 1. How to flag an incomplete token at the end of input // as an error. (use of boost::spirit::eoi) // 2. How to bind an action on an instance of an object // that is taken as input to the parser. // 3. Use of std::wstring. // 4. Use of the lexer iterator. // // This using directive will cause issues with boost::bind // when referencing placeholders such as _1. // using namespace boost::spirit; //! A class that tokenizes our input. template<typename Lexer> struct Tokens : boost::spirit::lex::lexer<Lexer> { Tokens() { identifier = L"[az]+"; separator = L":"; this->self.add (identifier) (separator) ; } boost::spirit::lex::token_def<std::wstring, wchar_t> identifier, separator; }; //! This class provides a callback that echoes strings to stderr. struct Echo { void echo(boost::fusion::vector<std::wstring> const& t) const { using namespace boost::fusion; std::wcerr << at_c<0>(t) << "\n"; } }; //! The definition of our grammar, as described above. template <typename Iterator> struct Grammar : boost::spirit::qi::grammar<Iterator> { template <typename TokenDef> Grammar(TokenDef const& tok, Echo const& e) : Grammar::base_type(path) { using boost::spirit::_val; path = ((token >> tok.separator >> path)[boost::bind(&Echo::echo, e,::_1)] | (token)[boost::bind(&Echo::echo, &e, ::_1)] ) >> boost::spirit::eoi; // Look for end of input. token = (tok.identifier) [_val=boost::spirit::qi::_1] ; } boost::spirit::qi::rule<Iterator> path; boost::spirit::qi::rule<Iterator, std::wstring()> token; }; int main() { // A set of typedefs to make things a little clearer. This stuff is // well described in the boost spirit documentation/examples. typedef std::wstring::iterator BaseIteratorType; typedef boost::spirit::lex::lexertl::token<BaseIteratorType, boost::mpl::vector<std::wstring> > TokenType; typedef boost::spirit::lex::lexertl::lexer<TokenType> LexerType; typedef Tokens<LexerType>::iterator_type TokensIterator; typedef LexerType::iterator_type LexerIterator; // Define some paths to parse. typedef std::vector<std::wstring> Tests; Tests paths; paths.push_back(L"abc"); paths.push_back(L"abc:xyz"); paths.push_back(L"abc:xyz:"); paths.push_back(L":"); // Parse 'em. for ( Tests::iterator iter = paths.begin(); iter != paths.end(); ++iter ) { std::wstring str = *iter; std::wcerr << L"*****" << str << L"*****\n"; Echo e; Tokens<LexerType> tokens; Grammar<TokensIterator> grammar(tokens, e); BaseIteratorType first = str.begin(); BaseIteratorType last = str.end(); // Have the lexer consume our string. LexerIterator lexFirst = tokens.begin(first, last); LexerIterator lexLast = tokens.end(); // Have the parser consume the output of the lexer. bool r = boost::spirit::qi::parse(lexFirst, lexLast, grammar); // Print the status and whether or note all output of the lexer // was processed. std::wcerr << r << L" " << (lexFirst==lexLast) << L"\n"; } } 
+2
source

Source: https://habr.com/ru/post/1439454/


All Articles