How to remove duplicate words from a string in a bash script?

I have a string containing duplicate words, for example:

abc, def, abc, def 

How to remove duplicates? The line I need is:

 abc, def 
+6
source share
4 answers

We have this test file:

 $ cat file abc, def, abc, def 

To remove duplicate words:

 $ sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' file abc, def 

How it works

  • :a

    This defines the label a .

  • s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g

    It searches for a duplicate word consisting of alphanumeric characters and removes the second occurrence.

  • ta

    If the last substitution command led to a change, it will return to the a mark to try again.

    Thus, the code continues to search for duplicates until it remains.

  • s/(, )+/, /g; s/, *$//

    These two substitution commands clear any combinations to the left of the comma.

Mac OSX or another BSD system

For Mac OSX or another BSD system, try:

 sed -E -e ':a' -e 's/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file 

Using a string instead of a file

sed easily processes input either from a file, as shown above, or from a shell line, as shown below:

 $ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' ab, cd, ef 
+5
source

You can use awk for this.

Example:

 #!/bin/bash string="abc, def, abc, def" string=$(printf '%s\n' "$string" | awk -v RS='[,[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}') string="${string%,*}" echo "$string" 

Output:

 abc, def 
+3
source

This can also be done in pure Bash:

 #!/bin/bash string="abc, def, abc, def" declare -A words IFS=", " for w in $string; do words+=( [$w]="" ) done echo ${!words[@]} 

Output

 def abc 

Explanation

words is an associative array ( declare -A words ), and each word is added as a key to it:

 words+=( [${w}]="" ) 

(We do not need its value, so I accepted the value "" as the value).

A list of unique words is a list of keys ( ${!words[@]} ).

There is one caveat, the conclusion is not divided into ", " . (You will have to repeat it again. IFS used only with ${words[*]} and not only the first IFS character is used.)

+2
source

I have another way for this case. I changed my input line, for example, below, and ran a command to edit it:

 #string="abc def abc def" $ echo "abc def abc def" | xargs -n1 | sort -u | xargs | sed "s# #, #g" abc, def 

Thanks for the support!

+1
source

Source: https://habr.com/ru/post/987393/


All Articles