Find and replace (part) a string in comment blocks using regular expression

Question

Find and replace (part) a string in comment blocks using regular expression

I am trying to find a specific line that can happen inside a comment block. This line can be a word, but it can also be part of a word. For example, suppose I look for the word "codex", then this word should be replaced with "bindex", but even if it is part of the word, for example, "codexing". This should be changed to "bindexing".

The trick is that this should happen only when this word is inside the comment block.

/* Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. */ This word --> codex should not be replaced /* Lorem ipsum dolor sit * amet, codex consectetur * adipiscing elit. */ /** Lorem ipsum dolor sit * amet, codex consectetur * adipiscing elit. */ // Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. # Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. ------------------- Below "codex" is part of a word ------------------- /* Lorem ipsum dolor sit amet, somecodex consectetur adipiscing elit. */ /* Lorem ipsum dolor sit * amet, codexing consectetur * adipiscing elit. */ And here also, this word --> codex should not be replaced /** Lorem ipsum dolor sit * amet, testcodexing consectetur * adipiscing elit. */ // Lorem ipsum dolor sit amet, __codex consectetur adipiscing elit. # Lorem ipsum dolor sit amet, codex__ consectetur adipiscing elit.

What I have so far is the code:

 $text = preg_replace ( '~(\/\/|#|\/\*).*?(codex).*?~', '$1 bindex', $text);

As you can see in this example , this does not work as we would like. It does not replace the word when it is inside the multi-line comment block /* */ , and sometimes it deletes all the text that was before the word "codex".

How to improve my regular expression so that it matches my requirements?

+4

php regex preg-replace

w00 Aug 05 '13 at 19:49

source share

5 answers

 $text = preg_replace ( '~(//|#|/\*)(.*?)(codex).*?~s', '$1$2bindex', $text );

this does not remove comments until "codex", as in the answer from anubhava

+2

trijin Aug 05 '13 at 20:22

source share

This version can handle any comments and not be interrupted with the following lines /**/ codex /**/ or /*xxxx codex codex xxxx*/ :

 $pattern = <<<'LOD' ~ # definitions (?(DEFINE) (?<cl> (?> [^c\n]++ | c(?!odex) )++ ) (?<c> (?> [^*c]++ | \*++(?!/) | c(?!odex) )++ ) ) # pattern (?| (?> (?>//|\#) \g<cl>*+ | \G(?<!^) \g<cl>?+ ) \K codex (\g<cl>*+) | (?> /\* \n*+ | \G(?<!^) (?!\n) ) \g<c>*+ \K codex (\n*+) ) ~x LOD; $replacement ="bindex$3"; $result = preg_replace($pattern, $replacement, $subject);

+1

Casimir et Hippolyte Aug 05 '13 at 21:40

source share

Something like this using subgroups should work;

 $str = preg_replace( '~(<!--[a-zA-Z0-9 \n]*)(MYWORD)([a-zA-Z0-9 \n]*-->)~s', '$1$3', $input );

You just need to create a separate rule for each type of comment and restrict the possible characters allowed inside the comment using a character class (you may prefer to use a negative character class).

0

Robadob Aug 05 '13 at 19:57

source share

As it was written hundreds, thousands, or even millions of times earlier in different comments, regular expressions are NOT for analyzing code or finding errors in one.

Consider the following examples:

 // code to be replaced var a = "/*code to be replaced*/"; /* code to be replaced var b = "*/code to be replaced"; */

It is impossible to parse the code (and yes, figuring out if there is a line inside the comment block is called parsing) with REGEX.

Find a parser library or create a smaller one of your own. If you create it, remember all the options for using the script and, in particular, how the lines will affect your code.

0

Ezlaver Aug 05 '13 at 20:00

source share

anubhava · Accepted Answer · 2013-08-05T20:01:44+0000

Since you are dealing with multi-line text here, you should use the s (DOTALL) modifier to match the text across multiple lines. In addition, the slash does not need to be escaped.

Try this code:

 $text = preg_replace ( '~(//|#|/\*).*?(codex).*?~s', '$1 bindex', $text );

Find and replace (part) a string in comment blocks using regular expression

More articles: