Awk, how to remove duplicates in a field, except for some lines

Question

Awk, how to remove duplicates in a field, except for some lines

This is the structure of my csv file:

Oslo        Company1           Mission1
Oslo        Company1           Mission2 
Oslo        Company3           Missionspecial 
Oslo        Companyspecial     Missionspecial
Paris       Company2           Mission1
Paris       Companyspecial     Mission2 
Paris       Company3           Missionspecial

I want to remove all duplicates in fields 1,2,3 and replace them with spaces, except for those special lines of "Companyspecial" "Missionspecial", that the output is:

Oslo        Company1             Mission1
                                 Mission2
            Company3             Missionspecial
            Companyspecial       Missionspecial
Paris       Company2             
            Companyspecial       
                                 Missionspecial

All I know is to remove all duplicates with this bit of code:

x[$1]++ {$1=""}x[$2]++ {$2=""}x[$3]++ {$3=""}){print $1,$2,$3,et.....}

I am not a programmer. Help will be greatly appreciated, saving hours of stupid slaves! Thank you very much! `

+1

string awk duplicates

Trying Dec 08 '10 at 23:05

source share

1 answer

SiegeX · Accepted Answer · 2010-12-08T23:50:17+0000

awk '{
  for(i=1;i<=3;i++)
    if($i !~ /(Mission|Company)special/)
      if(a[i,$i]++)
        $i=""
  printf("%-12s%-19s%-s\n",$1,$2,$3)
}'

Proof of the concept HERE

Edit

, , . , a[$i]++ a[i,$i]++, .

Awk, how to remove duplicates in a field, except for some lines

Edit

More articles: