Regex for developers

I am trying to find a regex to allow me to search for a specific line while automatically skipping comments. Does anyone have a RE like this or do you know? It doesn't even have to be complicated enough to skip blocks #if 0 ; I just want it to skip the // and /* blocks. The reverse, that is, searching only inside comment blocks, will also be very useful.

Environment: VS 2003

+4
source share
4 answers

This is a very difficult problem than it might seem at first glance, since you need to consider comment tokens inside the lines, comment on tokens that are commented out, etc.

I wrote a line and comment parser for C #, let me see if I can dig out something that will help ... I will update if I find anything.

EDIT: ... OK, so I found my old project "codemasker". It turns out that I did it in stages, and not with one regular expression. Basically, I go through the source file, looking for the tokens of the beginning, when I find it, then I look for the final and mask everything in between. This takes into account the context of the initial token ... if you find the token for the "beginning of the line", you can safely ignore comment tokens until you find the end of the line, and vice versa. When the code is masked (I used guides as masks and a hash table for tracking), you can safely search and replace, and then finally restore the masked code.

Hope this helps.

+3
source

Be especially careful with strings. Strings often have escape sequences, which you must also respect while you find their end.

So, for example, "This is \"a test\"" . You cannot blindly look for a double quote to complete. Also be careful with `` `This is \" `which indicates that you cannot just sayβ€œ if the double quote is preceded by a backslash ”.

In conclusion, do some brutal unit tests!

+2
source

Regular expression is not the best tool for the job.

Questions for Perl :

C comments:

 #!/usr/bin/perl $/ = undef; $_ = <>; s#/\*[^*]*\*+([^/*][^*]*\*+)*/|([^/"']*("[^"\\]*(\\[\d\D][^"\\]*)*"[^/"']*|'[^'\\]*(\\[\d\D][^'\\]*)*'[^/"']*|/+[^*/][^/"']*)*)#$2#g; print; 

Comments in C ++:

 #!/usr/local/bin/perl $/ = undef; $_ = <>; s#//(.*)|/\*[^*]*\*+([^/*][^*]*\*+)*/|"(\\.|[^"\\])*"|'(\\.|[^'\\])*'|[^/"']+# $1 ? "/*$1 */" : $& #ge; print; 
+2
source

I would make a copy and cross out the comments first, and then search the line in the usual way.

+1
source

Source: https://habr.com/ru/post/1276626/


All Articles