UTF8 char decoding accelerates in Boost Spirit

Asked Question: General List of Spirits

Hello to all,

I'm not sure my theme is correct, but the test code will probably show what I want to achieve.

I am trying to make out things like:

  • '% 40' to '@'
  • '% 3C' to '<'

Below is the minimum test file. I do not understand why this does not work. Maybe I'm wrong, but I don’t see it.

Usage: Compiler: gcc 4.6 Boost: current trunk

I use the following compilation line:

g++ -o main -L/usr/src/boost-trunk/stage/lib -I/usr/src/boost-trunk -g -Werror -Wall -std=c++0x -DBOOST_SPIRIT_USE_PHOENIX_V3 main.cpp 


 #include <iostream> #include <string> #define BOOST_SPIRIT_UNICODE #include <boost/cstdint.hpp> #include <boost/spirit/include/qi.hpp> #include <boost/phoenix/phoenix.hpp> typedef boost::uint32_t uchar; // Unicode codepoint namespace qi = boost::spirit::qi; int main(int argc, char **argv) { // Input std::string input = "%3C"; std::string::const_iterator begin = input.begin(); std::string::const_iterator end = input.end(); using qi::xdigit; using qi::_1; using qi::_2; using qi::_val; qi::rule<std::string::const_iterator, uchar()> pchar = ('%' > xdigit > xdigit) [_val = (_1 << 4) + _2]; std::string result; bool r = qi::parse(begin, end, pchar, result); if (r && begin == end) { std::cout << "Output: " << result << std::endl; std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl; } else { std::cerr << "Error" << std::endl; return 1; } return 0; } 

Hello,

Mattis MΓΆlmann

+4
source share
1 answer

qi::xdigit does not do what you think it does: returns a raw character (i.e. '0' , not 0x00 ).

You can use qi::uint_parser to your advantage, making your syntax much easier as a bonus:

 typedef qi::uint_parser<uchar, 16, 2, 2> xuchar; 
  • No need to rely on phoenix (to work on older versions of Boost)
  • get both characters at a time (otherwise you may need to add multiple castings to prevent the extension of entire extensions)

Here is a copied sample:

 #include <iostream> #include <string> #define BOOST_SPIRIT_UNICODE #include <boost/cstdint.hpp> #include <boost/spirit/include/qi.hpp> typedef boost::uint32_t uchar; // Unicode codepoint namespace qi = boost::spirit::qi; typedef qi::uint_parser<uchar, 16, 2, 2> xuchar; const static xuchar xuchar_ = xuchar(); int main(int argc, char **argv) { // Input std::string input = "%3C"; std::string::const_iterator begin = input.begin(); std::string::const_iterator end = input.end(); qi::rule<std::string::const_iterator, uchar()> pchar = '%' > xuchar_; uchar result; bool r = qi::parse(begin, end, pchar, result); if (r && begin == end) { std::cout << "Output: " << result << std::endl; std::cout << "Expected: < (LESS-THAN SIGN)" << std::endl; } else { std::cerr << "Error" << std::endl; return 1; } return 0; } 

Output:

 Output: 60 Expected: < (LESS-THAN SIGN) 

'<' is really ASCII 60

+2
source

Source: https://habr.com/ru/post/1380582/


All Articles