Regular expression interlacing differences between compilers

I use the ECMA script syntax in C ++ to validate input and have encountered a problem when changing compilers. When using alternation, the first expression on the left to match must be used if it is not disqualified by the rest of the regular expression. So, for the string, the "abcde"expression "ab?|ab?(?:cd|dc)"must match "ab". I found that different compilers have different opinions about this.

MCVE:

#include <regex>
#include <string>
#include <iostream>

int main()
{
    std::string line = "abcde";
    {
        const std::string RX_ION_TYPE("ab?|ab?(?:cd|dc)");

        const auto regexType = std::regex::ECMAScript;

        std::regex rx_ionType;

        rx_ionType.assign(
            "(" + RX_ION_TYPE + ")"
            , regexType);

        std::smatch match;

        if (std::regex_search(line, match, rx_ionType))
        {
            for (int i = 0; i < match.size(); i++)
            {
                std::cout << "|" << match.str(i) << "|\n";
            }

        }
        else
        {
            std::cout << "No match.\n";
        }
    }

    {
        const std::string RX_ION_TYPE("ab?(?:cd|dc)|ab?");

        const auto regexType = std::regex::ECMAScript;

        std::regex rx_ionType;

        rx_ionType.assign(
            "(" + RX_ION_TYPE + ")"
            , regexType);

        std::smatch match;

        if (std::regex_search(line, match, rx_ionType))
        {
            for (int i = 0; i < match.size(); i++)
            {
                std::cout << "|" << match.str(i) << "|\n";
            }

        }
        else
        {
            std::cout << "No match.\n";
        }
    }
    {
        const std::string RX_ION_TYPE("ab?(?:cd|dc)?");

        const auto regexType = std::regex::ECMAScript;

        std::regex rx_ionType;

        rx_ionType.assign(
            "(" + RX_ION_TYPE + ")"
            , regexType);

        std::smatch match;

        if (std::regex_search(line, match, rx_ionType))
        {
            for (int i = 0; i < match.size(); i++)
            {
                std::cout << "|" << match.str(i) << "|\n";
            }

        }
        else
        {
            std::cout << "No match.\n";
        }
    }

    return 0;
}

Online: ideone (gcc 5.1) cpp.sh (gcc 4.9) .2) rextester

I would expect to get

| AB |
| Aby |
| ABCD |
| ABCD |
| ABCD |
| ABCD |

Visual Studio 2013, gcc 5.1 (ideone) clang (rextester) gcc 4.9 (ubuntu local cpp.sh),

| ABCD |

.

():

  • , , ?
  • gcc 4.9, , gcc 5. CUDA , gcc 4.9. gcc 4.9 ( )?
+4

Source: https://habr.com/ru/post/1648648/


All Articles