Perl regular expression removing duplicate consecutive substrings in a string

I tried to search for this particular problem, but all I get is either deleting duplicate rows or deleting duplicate rows where they are separated by a separator.

My problem is slightly different. I have a string like

"comp name1 comp name2 comp name2 comp name3" 

where I want to remove the duplicate name comp2 and return only

  "comp name1 comp name2 comp name3" 

These are not consecutive repeating words, but consecutive duplicates of substrings. Is there a way to solve this problem with regular expressions?

+4
source share
5 answers
 s/(.*)\1/$1/g 

We will warn you that the runtime of this regular expression is quadratic in the length of the string.

+7
source

This works for me (MacOS X 10.6.7, Perl 5.13.4):

 use strict; use warnings; my $input = "comp name1 comp name2 comp name2 comp name3" ; my $output = "comp name1 comp name2 comp name3" ; my $result = $input; $result =~ s/(.*)\1/$1/g; print "In: <<$input>>\n"; print "Want: <<$output>>\n"; print "Got: <<$result>>\n"; 

The key point is "\ 1" in the mapping.

+3
source

To avoid removing duplicate characters in terms (e.g. comm1 β†’ com1), parenthesis. * in regular expression with \ b.

 s/(\b.*\b)\1/$1/g 
+2
source

I never work with languages ​​that support this, but since you are using Perl ...

Go here .. and see this section ....

Useful example: double word check

When editing text, doubled words such as "the" crawl easily. Using the regular expression \ b (\ w +) \ s + \ 1 \ b in a text editor, you can easily find them. To remove the second word, simply enter \ 1 as the replacement text and click the "Replace" button.

+1
source

If you need something working in linear time, you can split line and iterate through the list:

 #!/usr/bin/perl use strict; use warnings; my $str = "comp name1 comp name2 comp name2 comp name3"; my @elems = split("\\s", $str); my $prevComp; my $prevFlag = -1; foreach my $elemIdx (0..(scalar @elems - 1)) { if ($elemIdx % 2 == 1) { if (defined $prevComp) { if ($prevComp ne $elems[$elemIdx]) { print " $elems[$elemIdx]"; $prevFlag = 0; } else { $prevFlag = 1; } } else { print " $elems[$elemIdx]"; } $prevComp = $elems[$elemIdx]; } elsif ($prevFlag == -1) { print "$elems[$elemIdx]"; $prevFlag = 0; } elsif ($prevFlag == 0) { print " $elems[$elemIdx]"; } } print "\n"; 

Dirty, perhaps, but should work faster.

+1
source

Source: https://habr.com/ru/post/1346654/


All Articles