Error in std :: regex?

Here is the code:

#include <string> #include <regex> #include <iostream> int main() { std::string pattern("[^c]ei"); pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*"; std::regex r(pattern); std::smatch results; std::string test_str = "cei"; if (std::regex_search(test_str, results, r)) std::cout << results.str() << std::endl; return 0; } 

Output:

 cei 

Used gcc 4.9.1 compiler gcc 4.9.1 .

I am learning regular expression for beginners. I expected that nothing should be output, since "cei" does not match the pattern here. Am I doing it right? What is the problem?

Update:

This is reported and confirmed as an error, for more information see here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63497

+6
source share
2 answers

This is a mistake in implementation. Not only a couple of other tools that I tried agree that your template doesn't match your input, but I tried this:

 #include <string> #include <regex> #include <iostream> int main() { std::string pattern("([az]*)([az])(e)(i)([az]*)"); std::regex r(pattern); std::smatch results; std::string test_str = "cei"; if (std::regex_search(test_str, results, r)) { std::cout << results.str() << std::endl; for (size_t i = 0; i < results.size(); ++i) { std::ssub_match sub_match = results[i]; std::string sub_match_str = sub_match.str(); std::cout << i << ": " << sub_match_str << '\n'; } } } 

It basically looks like what you had, but I replaced [:alpha:] with [az] for simplicity, and also temporarily replaced [^c] with [az] because it makes it work correctly. Here is what it prints (GCC 4.9.0 on Linux x86-64):

 cei 0: cei 1: 2: c 3: e 4: i 5: 

If I replaced [az] where you had [^c] , and instead just put f , he correctly said that the pattern did not match. But if I use [^c] , how are you:

 std::string pattern("([az]*)([^c])(e)(i)([az]*)"); 

Then I get this output:

 cei 0: cei 1: cei terminate called after throwing an instance of 'std::length_error' what(): basic_string::_S_create Aborted (core dumped) 

Thus, he claims to be successful, and the result [0] is the "cei" that is expected. Then the results [1] are also β€œcei”, which, I think, may be in order. But then the results [2] fail because it tries to build a std::string length 18446744073709551614 with begin = nullptr. And this giant number is exactly 2^64 - 2 , aka std::string::npos - 1 (on my system).

So, I think that somewhere there is a β€œone by one” error, and the influence can be much more than just a false match of regular expressions - it may crash at runtime.

+4
source

The regular expression matches and should not match the string "cei".

A regular expression can be checked and is best described in Perl:

  my $regex = qr{ # start regular expression [[:alpha:]]* # 0 or any number of alpha chars [^c] # followed by NOT-c character ei # followed by e and i characters [[:alpha:]]* # followed by 0 or any number of alpha chars }x; # end + declare 'x' mode (ignore whitespace) print "xei" =~ /$regex/ ? "match\n" : "no match\n"; print "cei" =~ /$regex/ ? "match\n" : "no match\n"; 

The regular expression will first use all characters at the end of the string ( [[:alpha:]]* ), then backtrack to find the NON-c char [^c] and continue with e and I (by returning at another time).

Result:

  "xei" --> match "cei" --> no match 

for obvious reasons. Any inconsistencies with this in various C ++ libraries and testing tools are an implementation problem there, imho.

+2
source

Source: https://habr.com/ru/post/976476/


All Articles