We have this test file:
$ cat file abc, def, abc, def
To remove duplicate words:
$ sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' file abc, def
How it works
:a
This defines the label a .
s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g
It searches for a duplicate word consisting of alphanumeric characters and removes the second occurrence.
ta
If the last substitution command led to a change, it will return to the a mark to try again.
Thus, the code continues to search for duplicates until it remains.
s/(, )+/, /g; s/, *$//
These two substitution commands clear any combinations to the left of the comma.
Mac OSX or another BSD system
For Mac OSX or another BSD system, try:
sed -E -e ':a' -e 's/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file
Using a string instead of a file
sed easily processes input either from a file, as shown above, or from a shell line, as shown below:
$ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' ab, cd, ef
source share