How to make a conditional greedy match in Perl?

I want Perl to parse the text of the code and identify some things, like code:

use strict; use warnings; $/ = undef; while (<DATA>) { s/(\w+)(\s*<=.*?;)/$1_yes$2/gs; print; } __DATA__ always @(posedge clk or negedge rst_n) if(!rst_n)begin d1 <= 0; //perl_comment_4 //perl_comment_5 d2 <= 1 //perl_comment_6 + 2; end else if( d3 <= d4 && ( d5 <= 3 ) ) begin d6 <= d7 + (d8 <= d9 ? 1 : 0); //perl_comment_7 d10 <= d11 <= d12 + d13 <= d14 ? 1 : 0; end 

The purpose of compliance is that which meets all of the following:

(1) It starts with a combination of word\s*<= . Here \s* can be 0 or more spaces, newlines, tabs.

(2) The above “combination” must be outside of any pair ( ) .

(3) If several “combinations” appear in sequence, first select the first. (Something like a "greedy" match on the left border)

(4) ends first ; after the “combination” mentioned in (1).

The code comments may include word\s*<= and ; (there may be something in the comments); this complicates the situation. To make life easier, I already pre-processed the text, looked at the comments and replaced them with things like //perl_comment_6 . (This solution seems rather cumbersome and stupid. Any smart, more elegant solutions?)

What I want to do:

For all matched word\s*<= replace word with word_yes . For the sample code, d1 , d2 , d6 and d10 should be replaced by d1_yes strong>, d2_yes , d6_yes and d10_yes , and all other parts of the text should remain unchanged.

In my current code I use s/(\w+)(\s*<=.*?;)/$1_yes$2/gs; , which correctly recognizes d1 , d2 and d10 , but cannot recognize d6 and erroneously recognizes d3 .

Any suggestions? Thanks in advance ~

+5
source share
1 answer

It is much more complicated than you could imagine, and it is impossible to do it correctly without writing a parser for the language you are trying to process. However, you might be lucky if your pattern is a consistently bounded subset of the language

The best way I can do this is to use split to highlight all the subkeys of the string that are in parentheses from the “top-level” partitions where replacements should be made. Changes can then be made to the appropriate parts, and the divided sections are connected together.

Even this depends on the code having correctly balanced parentheses, and the odd open or closing parenthesis that appears, for example, in a line or comment, will throw the process out. The regular expression used in split must be recursive so that the enclosed parentheses can be matched and make it a regular expression to capture. split returns all parts of a string instead of sections between matches

This code will do as you ask, but be careful that, as I described, it is very fragile

 use strict; use warnings; my $data = do { local $/; <DATA>; }; my @split = split / ( \( (?> [^()] | (?1) )* \) ) /x, $data; for ( @split ) { next if /[()]/; s/ ^ \s* \w+ \K (?= \s* <= ) /_yes/xgm; } print join '', @split; __DATA__ always @(posedge clk or negedge rst_n) if(!rst_n)begin d1 <= 0; //perl_comment_4 //perl_comment_5 d2 <= 1 //perl_comment_6 + 2; end else if( d3 <= d4 && ( d5 <= 3 ) ) begin d6 <= d7 + (d8 <= d9 ? 1 : 0); //perl_comment_7 d10 <= d11 <= d12 + d13 <= d14 ? 1 : 0; end 

Output

 always @(posedge clk or negedge rst_n) if(!rst_n)begin d1_yes <= 0; //perl_comment_4 //perl_comment_5 d2_yes <= 1 //perl_comment_6 + 2; end else if( d3 <= d4 && ( d5 <= 3 ) ) begin d6_yes <= d7 + (d8 <= d9 ? 1 : 0); //perl_comment_7 d10_yes <= d11 <= d12 + d13 <= d14 ? 1 : 0; end 
+7
source

Source: https://habr.com/ru/post/1243569/


All Articles