Run a command on the same line multiple times with sed

I need to highlight each repeated word in the text with the * character.
for instance

 lol foo lol bar foo bar 

it should be

 lol foo *lol* bar *foo* *bar* 

I tried the following command:

 echo "lol foo lol bar foo bar" | sed -r -e 's/(\b[a-zA-Z]+\b)([^*]+)(\1)/\1\2*\3*/' 

This gives me:

 lol foo *lol* bar foo bar 

Then I added the g flag:

 lol foo *lol* bar foo *bar* 

But foo not highlighted. I know this happens because sed does not look back if a match has been found .

Is it possible to process it only with sed ?

+6
source share
2 answers

Sed not the best tool for this task. It does not look forward, it does not look greedy quantifiers, but try the following command:

 sed -r -e ':a ; s/\b([a-zA-Z]+)\b(.*) (\1)( |$)/\1\2 *\3* / ; ta' 

It uses conditional branching to execute the substitution command until it works. Also, you cannot check ([^*]+) because for the second round it must cross part * first substitution, your option is greedy .* . And last, you cannot match (\1) just because it will match the first line of lol again and again. You need some kind of context surrounded by spaces or the end of a line.

The command gives:

 lol foo *lol* bar *foo* *bar* 

UPDATE : improvement provided by potong in the comments:

 sed -r ':a;s/\b(([[:alpha:]]+)\s.*\s)\2\b/\1*\2*/;ta' file 
+4
source

Using awk

 awk '{for (i=1;i<=NF;i++) if (a[$i]++>=1) printf "*%s* ",$i; else printf "%s ",$i; print ""}' file lol foo *lol* bar *foo* *bar* 
+1
source

Source: https://habr.com/ru/post/954815/


All Articles