Find and replace (part) a string in comment blocks using regular expression

I am trying to find a specific line that can happen inside a comment block. This line can be a word, but it can also be part of a word. For example, suppose I look for the word "codex", then this word should be replaced with "bindex", but even if it is part of the word, for example, "codexing". This should be changed to "bindexing".

The trick is that this should happen only when this word is inside the comment block.

/* Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. */ This word --> codex should not be replaced /* Lorem ipsum dolor sit * amet, codex consectetur * adipiscing elit. */ /** Lorem ipsum dolor sit * amet, codex consectetur * adipiscing elit. */ // Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. # Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. ------------------- Below "codex" is part of a word ------------------- /* Lorem ipsum dolor sit amet, somecodex consectetur adipiscing elit. */ /* Lorem ipsum dolor sit * amet, codexing consectetur * adipiscing elit. */ And here also, this word --> codex should not be replaced /** Lorem ipsum dolor sit * amet, testcodexing consectetur * adipiscing elit. */ // Lorem ipsum dolor sit amet, __codex consectetur adipiscing elit. # Lorem ipsum dolor sit amet, codex__ consectetur adipiscing elit. 

What I have so far is the code:

 $text = preg_replace ( '~(\/\/|#|\/\*).*?(codex).*?~', '$1 bindex', $text); 

As you can see in this example , this does not work as we would like. It does not replace the word when it is inside the multi-line comment block /* */ , and sometimes it deletes all the text that was before the word "codex".

How to improve my regular expression so that it matches my requirements?

+4
source share
5 answers

Since you are dealing with multi-line text here, you should use the s (DOTALL) modifier to match the text across multiple lines. In addition, the slash does not need to be escaped.

Try this code:

 $text = preg_replace ( '~(//|#|/\*).*?(codex).*?~s', '$1 bindex', $text ); 
+3
source
 $text = preg_replace ( '~(//|#|/\*)(.*?)(codex).*?~s', '$1$2bindex', $text ); 

this does not remove comments until "codex", as in the answer from anubhava

+2
source

This version can handle any comments and not be interrupted with the following lines /**/ codex /**/ or /*xxxx codex codex xxxx*/ :

 $pattern = <<<'LOD' ~ # definitions (?(DEFINE) (?<cl> (?> [^c\n]++ | c(?!odex) )++ ) (?<c> (?> [^*c]++ | \*++(?!/) | c(?!odex) )++ ) ) # pattern (?| (?> (?>//|\#) \g<cl>*+ | \G(?<!^) \g<cl>?+ ) \K codex (\g<cl>*+) | (?> /\* \n*+ | \G(?<!^) (?!\n) ) \g<c>*+ \K codex (\n*+) ) ~x LOD; $replacement ="bindex$3"; $result = preg_replace($pattern, $replacement, $subject); 
+1
source

Something like this using subgroups should work;

 $str = preg_replace( '~(<!--[a-zA-Z0-9 \n]*)(MYWORD)([a-zA-Z0-9 \n]*-->)~s', '$1$3', $input ); 

You just need to create a separate rule for each type of comment and restrict the possible characters allowed inside the comment using a character class (you may prefer to use a negative character class).

0
source

As it was written hundreds, thousands, or even millions of times earlier in different comments, regular expressions are NOT for analyzing code or finding errors in one.

Consider the following examples:

 // code to be replaced var a = "/*code to be replaced*/"; /* code to be replaced var b = "*/code to be replaced"; */ 

It is impossible to parse the code (and yes, figuring out if there is a line inside the comment block is called parsing) with REGEX.

Find a parser library or create a smaller one of your own. If you create it, remember all the options for using the script and, in particular, how the lines will affect your code.

0
source

Source: https://habr.com/ru/post/1495393/


All Articles