Tokenizing string, taking everything between a given character set in CPP

I have the following code:

int main() { string s = "server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')"; regex re("(\'[!-~]+\')"); sregex_token_iterator i(s.begin(), s.end(), re, 1); sregex_token_iterator j; unsigned count = 0; while(i != j) { cout << "the token is "<<*i<< endl; count++; } cout << "There were " << count << " tokens found." << endl; return 0; } 

Using the above regex, I would like to extract the line between paranthesis and single quote :, Out out should look like this:

 the token is 'm1.labs.teradata.com' the token is 'use\')r_*5' the token is 'u" er 5' the token is 'default' There were 4 tokens found. 

Basically, a regex should extract everything between "(" and ")". It can be any space, special character, quote or final parathesis. Earlier, I used the following regular expression:

 boost::regex re_arg_values("(\'[!-~]+\')"); 

But there was no place. Please help me with this. Thanks in advance.

0
source share
2 answers

Here is an example of using Spirit X3 to create a grammar to actually analyze this. I would like to parse a map (key-> value) of pairs, which makes a lot more sense than just blindly, assuming the names are always the same:

 using Config = std::map<std::string, std::string>; using Entry = std::pair<std::string, std::string>; 

Now we set some grammar rules using X3:

 namespace parser { using namespace boost::spirit::x3; auto value = quoted("'") | quoted('"'); auto key = lexeme[+alpha]; auto pair = key >> '(' >> value >> ')'; auto config = skip(space) [ *as<Entry>(pair) ]; } 

The helpers as<> and quoted are simple lambdas:

 template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; }; auto quoted = [](auto q) { return lexeme[q >> *('\\' >> char_ | char_ - q) >> q]; }; 

Now we can directly parse the string to the map:

 Config parse_config(std::string const& cfg) { Config parsed; auto f = cfg.begin(), l = cfg.end(); if (!parse(f, l, parser::config, parsed)) throw std::invalid_argument("Parse failed at " + std::string(f,l)); return parsed; } 

And a demo program

 int main() { Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')"); for (auto& setting : cfg) std::cout << "Key " << setting.first << " has value " << setting.second << "\n"; } 

Print

 Key dbname has value default Key password has value u" er 5 Key server has value m1.labs.teradata.com Key username has value use')r_*5 

Live demo

Live on coliru

 #include <iostream> #include <boost/spirit/home/x3.hpp> #include <boost/fusion/adapted/std_pair.hpp> #include <map> using Config = std::map<std::string, std::string>; using Entry = std::pair<std::string, std::string>; namespace parser { using namespace boost::spirit::x3; template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; }; auto quoted = [](auto q) { return lexeme[q >> *(('\\' >> char_) | (char_ - q)) >> q]; }; auto value = quoted("'") | quoted('"'); auto key = lexeme[+alpha]; auto pair = key >> '(' >> value >> ')'; auto config = skip(space) [ *as<Entry>(pair) ]; } Config parse_config(std::string const& cfg) { Config parsed; auto f = cfg.begin(), l = cfg.end(); if (!parse(f, l, parser::config, parsed)) throw std::invalid_argument("Parse failed at " + std::string(f,l)); return parsed; } int main() { Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')"); for (auto& setting : cfg) std::cout << "Key " << setting.first << " has value " << setting.second << "\n"; } 

Bonus

If you want to learn how to extract the original input: just try

 auto source = skip(space) [ *raw [ pair ] ]; 

as in this:

 using RawSettings = std::vector<std::string>; RawSettings parse_raw_config(std::string const& cfg) { RawSettings parsed; auto f = cfg.begin(), l = cfg.end(); if (!parse(f, l, parser::source, parsed)) throw std::invalid_argument("Parse failed at " + std::string(f,l)); return parsed; } int main() { for (auto& setting : parse_raw_config(text)) std::cout << "Raw: " << setting << "\n"; } 

What prints: Live On Coliru

 Raw: server ('m1.labs.teradata.com') Raw: username ('use\')r_*5') Raw: password('u" er 5') Raw: dbname ('default') 
+2
source

Fix several syntax and style issues:

  • you need to exit \ in lines C
  • you had " in s, which resulted in a syntax error
 #include <boost/regex.hpp> #include <boost/range/iterator_range.hpp> #include <iostream> int main() { std::string s = "server ('m1.labs.teradata.com') username ('use\')r_*5') password('u' er 5') dbname ('default')"; boost::regex re(R"(('([^'\\]*(?:\\[\s\S][^'\\]*)*)'))"); size_t count = 0; for (auto tok : boost::make_iterator_range(boost::sregex_token_iterator(s.begin(), s.end(), re, 1), {})) { std::cout << "Token " << ++count << " is " << tok << "\n"; } } 

Print

 Token 1 is 'm1.labs.teradata.com' Token 2 is 'use' Token 3 is ') password(' Token 4 is ' er 5' Token 5 is 'default' 
0
source

Source: https://habr.com/ru/post/1270129/


All Articles