Serial delimiters are ignored by the BOOST / tokenizer

I use a BOOST / tokenizer to split the string. It works fine for strings of type "1,2,3", but when there are two or more consecutive delimiters, for example, "1, 3,4", it returns "1", "3", "4".

Is there a way tokenizer returns an empty string "" instead of skipping it?

+4
source share
2 answers

The Boost.Tokenizer char_separatorclass provides the ability to display an empty token or skip it with a parameter empty_tokens. By default, it is equal boost::drop_empty_tokensto the corresponding behavior strtok(), but it can be said to withdraw empty tokens by providing boost::keep_empty_tokens.

For example, with the following program:

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

int main()
{
  std::string str = "1,,3,4";
  typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
  boost::char_separator<char> sep(
      ",", // dropped delimiters
      "",  // keep delimiters
      boost::keep_empty_tokens); // empty token policy

  BOOST_FOREACH(std::string token, tokenizer(str, sep))
  {
    std::cout << "<" << token << "> ";
  }
  std::cout << std::endl;
}

Output:

<1> <> <3> <4> 
+6
source

I suggested that you use the split function as below

string text = "1,,3,4";
list<string> tokenList;
split(tokenList, text, is_any_of(","));
BOOST_FOREACH(string t, tokenList)
{
  cout << t << "." << endl;
}

If you carefully study the split prototype here you will see the default option at the end!

So, now in your call use explicit token_compress_offfor the last parameter, and that will be fine.

split(tokenList, text, is_any_of(","), token_compress_off);
+4
source

Source: https://habr.com/ru/post/1531165/


All Articles