How to remove C-style comments from code

I just read a new question here about how SO requested basically the same thing as in my title. It made me think - and search the Internet (most of the hits pointed to SO, of course;). So I thought -

There should be a simple regular expression capable of removing C-style comments from any code.

Yes, there are answers to this question / statement on SO, but the ones I found are all incomplete and / or overly complex.

So, I started experimenting and came up with one that works with all types of I code that I can imagine:

(?:\/\/(?:\\\n|[^\n])*\n)|(?:\/\*(?:\n|\r|.)*?\*\/)|(("|')(?:\\\\|\\\2|\\\n|[^\2])*?\2) 

The first alternative checks for double slash // comments. The second for ordinary /* comment */ . The third one is that I had problems finding other regular expressions related to the same task processing. lines containing sequences of characters that are outside the line will be considered comments .

What this part does is capture any lines in the capture group one, match the sign of the quote in the capture group of the second, with the quoted ones, up to the end of the line.

The capture group should be saved as a replacement, everything is discarded (replaced by "" ), leaving the code without comment :).

Here is an example of C in regex101.

OK ... So no question. This is the answer you think ...

Yes you are right. So ... to the question.

Am I missing any code that would miss this regex?

He processes

multi-line comments

 /* an easy one */ 

end of line comments

 // Remove this 

comments in lines

 char array[] = "Following isn't a comment // because it in a string /* this neither */"; 

leading to escaped quotation strings

  char array[] = "Handle /* comments */ - // - in strings with \" escaped quotes"; 

and lines with screened screens

  char array[] = "Handle strings with **not** escaped quotes\\"; // <-EOS 

javscript single quote

 var myStr = 'Should also ignore enclosed // comments /* like these */ '; 

line continuation

 // This is a single line comment \ continuing on the next row (warns, but works in my C++ flavor) 

So, can you come up with any code cases that mess it up? If you come up with something, I will try to complete the RE and hopefully it ends up complete;)

Sincerely.

PS. I know ... Writing this says in the right pane, under "How to ask: we prefer questions that can be answered rather than just discussed." This question may violate this: S, but I can not resist.

In fact, this may turn out to be the answer, and not the question, to some people. (Too cheeky ?;)

+2
source share
1 answer

I reviewed the comments (so far) and changed the regex to:

 (?:\/\/(?:\\\n|[^\n])*\n)|(?:\/\*[\s\S]*?\*\/)|((?:R"([^(\\\s]{0,16})\([^)]*\)\2")|(?:@"[^"]*?")|(?:"(?:\?\?'|\\\\|\\"|\\\n|[^"])*?")|(?:'(?:\\\\|\\'|\\\n|[^'])*?')) 

It processes the Biffens C ++ 11 string literal (as well as C # string strings), and it has changed according to Wiktors suggestions.

Separate it into single and double quotes separately because of the difference in logic (and avoid broken backreferences;).

This is undoubtedly more complex, but still far from the solutions that I saw there that hardly cover any string problems. And he may be deprived of parts that are not applicable to a particular language.

One comment offers support for more languages. This would make RE (even more) complex and unmanageable. It should be relatively easy to adapt, though.

Updated regex101 example .

Thank you all for your contribution so far. And keep offering suggestions.

Hello

Edit: Refresh the source string - this time I really read the spec.;)

+1
source

Source: https://habr.com/ru/post/1263497/


All Articles