Sed does not replace all instances in a file when areas overlap

I need to replace a few words with other words.

For example, "apple" with "FRUIT" in file , only in these 4 situations:

  • _apple_ , has a space before and after.
  • [apple_ , has a square opening bracket before and a space after.
  • _apple] , has a space before and a square closing bracket after.
  • [apple] , has square brackets before and after.

I do not want replacements to occur in any other situation.

I tried using the following code:

 a="apple" b="fruit" sed -i "s/ $a / $b /g" ./file sed -i "s/\[$a /\[$b /g" ./file sed -i "s/ $a\]/ $b\]/g" ./file sed -i "s/\[$a\]/\[$b\]/g" ./file 

I thought that the ā€œgā€ option at the end would mean that it would replace all instances, but I found that this is not a complete solution. E.g. if file contains the following:

 apple spider apple apple spider tree apple tree 

The third appearance of the "apple" is not replaced. Also in this several variants of the word are not changed:

 apple spider apple apple apple apple apple spider tree apple tree 

I suspect this is due to a common "space".

How can I get this to find and replace all instances of $a with $b , regardless of any match?

+4
source share
5 answers

A quick and dirty solution is to replace twice.

 $ echo apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g; s/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g' apple FRUIT FRUIT apple[FRUIT FRUIT] 

This is safe because, after the first command, the resulting text will not contain the occurrences (\[| )apple( |\]) that were not already in the source text.

The disadvantage is that two replacements take about twice as much time to start.

If you broke it in two versions of sed, you can see the steps more understandable:

 $ echo apple apple apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g' apple FRUIT apple FRUIT apple apple[FRUIT apple] $ echo apple FRUIT apple FRUIT apple apple[FRUIT apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g' apple FRUIT FRUIT FRUIT FRUIT apple[FRUIT FRUIT] 
+3
source

You can do this using backlinks. This should be fully compatible with POSIX.

 sed -i 's/^badger\([] ]\)/SNAKE\1/g; \ s/\([[ ]\)badger$/\1SNAKE/g; \ s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g; \ s/ badger]/ SNAKE]/g' ./infile 

Example

 $ sed 's/^badger\([] ]\)/SNAKE\1/g;s/\([[ ]\)badger$/\1SNAKE/g;s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g;s/ badger]/ SNAKE]/g' <<<"badger [badger badger] [badger] badger foobadger badgering mushroom badger" SNAKE [SNAKE SNAKE] [SNAKE] SNAKE foobadger badgering mushroom SNAKE 
+3
source
 sed -i "s/\bapple\b/FRUIT/g" file 

\b matches word boundaries. Probably not quite portable, at least not working on Mac OS X.

And a more interesting test:

 $ cat file; sed "s/\bapple\b/FRUIT/g" file apple apple apple spider tree apple tree applejuice pineapple apple.com etc FRUIT FRUIT FRUIT spider tree FRUIT tree applejuice pineapple FRUIT.com etc 
+2
source

Consider using appearance and take a look:

 s/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g 

Demo: http://regexr.com?2vl8p


Well, I tested regex on my computer now and noted that looking ahead and looking from behind does not work in standard sed , instead you would use ssed with --regexp-perl :

  uname -msrv
 Darwin 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug 9 20:54:00 PDT 2011;  root: xnu-1699.24.8 ~ 1 / RELEASE_X86_64 x86_64
  ssed --ver
 super-sed version 3.62
 based on GNU sed version 4.1

 Copyright (C) 2003 Free Software Foundation, Inc.
 This is free software;  see the source for copying conditions.  There is NO
 warranty;  not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
 to the extent permitted by law. 
  ssed -R 's / (? <= [\ s \ []) apple (? = [\ s \]]) / FRUIT / g'
 apple spider apple apple spider tree apple tree
 apple spider FRUIT FRUIT spider tree FRUIT tree
+1
source

One way: sed :

 sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file 

There are three lookup commands. Explanation:

 s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g # Duplicate each space character surrounded with non-space # characters. s/\( \|\[\)$a\( \|\]\)/\1$b\2/g # Substitute content of variable '$a' when just before there is a # blank or '[' and just after another space or ']'. Any combination # of those. And replace with content of variable '$b' and same # groups of the pattern (\1 and \2). s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g # Remove a space when found two consecutive surrounded with # non-space characters. 

My test:

File contents:

 apple spider apple apple spider tree apple tree apple spider [apple apple spider tree apple] tree apple spider apple apple spider tree appletree apple spider apple apple spider tree [apple] tree apple spider apple apple apple apple apple spider tree apple tree 

Set variables:

 a="apple" b="fruit" 

Run the sed command:

 sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file 

Result:

 apple spider fruit fruit spider tree fruit tree apple spider [fruit fruit spider tree fruit] tree apple spider fruit fruit spider tree appletree apple spider fruit fruit spider tree [fruit] tree apple spider fruit fruit fruit fruit fruit spider tree fruit tree 

It will not work if your real file has a different distribution of spaces or has a strange format. In this case, sed is a limited tool, it would be better perl or similarly with the options "look ahead" and "look".

+1
source

Source: https://habr.com/ru/post/1389572/


All Articles