Delete all duplicate entries in the field

Question

Delete all duplicate entries in the field

I have a file that has the following format:

text number number A;A;A;A;A;A text number number B text number number C;C;C;C;D;C;C;C;C

What I want to do is delete all the duplicate entries in the fourth column, so that in the end:

 text number number A text number number B text number number C;D

I would prefer to use the bash script for the solution to fit into the pipe with other text manipulations that I do with this file.

Thanks!

+4

bash awk sed

Joshuaa Nov 02 '12 at 18:55

source share

4 answers

This may work for you (GNU sed):

 sed 's/.*\s/&\n/;h;s/.*\n//;:a;s/\(\([^;]\).*\);\2/\1/;ta;H;g;s/\n.*\n//' file

+2

potong Nov 02 '12 at 19:09

source share

Assuming a tab delimited tab, you can do it like this with parallel GNU:

 parallel -C '\t' c4='$(echo {4} | tr ";" "\n" | sort -u | head -c-1 | tr "\n" ";");' \ echo -e '"{1}\t{2}\t{3}\t$c4"' :::: infile

Output:

 text number number A text number number B text number number C;D

+2

Thor Nov 03 '12 at 23:46

source share

It may work too

 awk -F";" '{ delete words match($1,/[[:alpha:]]$/) words[substr($1,RSTART,RLENGTH)]++ printf "%s",$1 for (i=2;i<=NF;i++){ if (!words[$i]++) printf ";%s",$i } printf "\n" }' file

Notes:

Since ; used as a field separator, it does not matter how many columns (or whose separators are used for these columns) to A;A;A;A;A;A
/[[:alpha:]]$/ can be replaced by /[^[:space:]]+$/ to match multiple non-spatial characters instead of a single alphabet.
if (!words[$i]++) printf ";%s",$i prints a column / character if it does not exist as a key for the words associative array, i.e. if words[$i] is 0.

+1

doubledown Nov 03 '12 at 10:37

source share

iruvar · Accepted Answer · 2012-11-02T19:13:34+0000

can achieve this using awk . Divide field 4 into an array using; first

 awk '{delete z; d=""; split($4,arr,";");for (k in arr) z[arr[k]]=k; for (l in z) d=d";"l; print($1,$2,$3,substr(d, 2))}' file_name

Delete all duplicate entries in the field

More articles: