What should happen to "a" b "c" ?
Note that in the substring " b " spaces are between quotation marks.
- change -
I assume that the space is “between quotation marks” if preceded by an odd number of standard quotes (ie U + 0022, I will ignore these funny “Unicode quotes”).
This means that you need the following regular expression: ^[^"]*("[^"]*"[^"]*)*"[^"]* [^"]*"[^"]*("[^"]*"[^"]*)*$
("[^"]*"[^"]*) represents a pair of quotation marks. ("[^"]*"[^"]*)* represents an even number of quotes, ("[^"]"[^"]*)*" an odd number. Then there is an actual string with quotes, followed by another one odd number of quotes. ^$ need anchors because you need to count every quote from the beginning of the line. This answers the problem of substring " b " above without ever looking at substrings. The price is that each character of your input must be matched with the whole line, which turns this into an O (N * N) split operation.
The reason you can do this in regex is because a limited amount of memory is required. Only one bit is effective; "Have I seen a strange or even number of quotes so far?" In fact, you do not need to match the individual pairs "" .
However, this is not the only interpretation. If you have included "funny Unicode quotes" that should be paired, you also need to deal with the lines ""double quoted"" . This, in turn, means that you need the number of open ones, " which means you need endless storage, which in turn means that it is no longer an ordinary language, which means that you cannot use the regular expression. What and it was required to prove.
In any case, even if it was possible, you still need the right parser. The behavior of O (N * N) to count the number of quotes preceding each character is simply not funny. If you already know that there are quotation marks X preceding Str [N], this should be an O (1) operation to determine how many quotes precede Str [N + 1], and not O (N). The possible answers are simply X or X + 1!
source share