Lex / Flex: regex for string literals in C / C ++?

I look at the ANSI C grammar here.

This page contains many regular expressions in Lex / Flex for ANSI C.

The problem with understanding regex for string literals.

They referred to the regular expression as \"(\\.|[^\\"])*\"

As I understand it, \" this is used for double quotes, \\ for an escape character,. For any character except the escape character and * for zero or more.

[^\\"] implies characters other than \ , " .

So, in my opinion, the regex should be \"(\\.)*\" .

Can you give a few lines where the regular expression fails?

or

Why did they use [^\\"] ?

+6
source share
1 answer

The proposed expression \"(\\.)*\" That you proposed corresponds to strings consisting of \ characters, alternating with any characters of the type:

 "\z\x\p\r" 

So this regular expression will not match the string, for example:

 "hello" 

The string "hello" will match the regular expression \".*\" , But also matches the string """" or "\" , both of which are invalid.

To get rid of these invalid matches, we can use \"[^\\"]*\" , but now it will not match a string like "\a\a\a" , which is a valid string.

As we saw, \"(\\.)*\" Matches this line, so all we need to do is combine these two to get \"(\\.|[^\\"])*\" .

+4
source

Source: https://habr.com/ru/post/981797/


All Articles