Column-based sorting and deletion

Question

Column-based sorting and deletion

I have a text file:

$ cat text 542,8,1,418,1 542,9,1,418,1 301,34,1,689070,1 542,9,1,418,1 199,7,1,419,10

I would like to sort the file based on the first column and remove duplicates using sort , but everything is not going as expected.

Approach 1

 $ sort -t, -u -b -k1n text 542,8,1,418,1 542,9,1,418,1 199,7,1,419,10 301,34,1,689070,1

It is not sorted based on the first column.

Approach 2

 $ sort -t, -u -b -k1n,1n text 199,7,1,419,10 301,34,1,689070,1 542,8,1,418,1

It deletes the line 542,9,1,418,1 , but I want to keep one copy.

It seems that the first approach removes duplicates, but does not sort correctly, while the second sort correctly, but removes more than I want. How do I get the correct result?

+6

sorting bash shell

Yang Jul 25 '13 at 2:02

source share

2 answers

When sorting by key, you must also indicate the end of the key, otherwise sorting also uses all of the following keys.

The following should work:

 sort -t, -u -k1,1n text

0

choroba Jul 25 '13 at 2:13

source share

jaypal singh · Accepted Answer · 2013-07-25T02:19:09+0000

The problem is that when providing a key before sort unique occurrences look for that particular field. Since line 542,8,1,418,1 is 542,8,1,418,1 , sort sees the next two lines starting with 542 as duplicates and filters them out.

It is best to either sort all the columns:

 sort -t, -nk1,1 -nk2,2 -nk3,3 -nk4,4 -nk5,5 -u text

or

use awk to filter duplicate lines and swipe it to sort .

 awk '!_[$0]++' text | sort -t, -nk1,1

Column-based sorting and deletion

Approach 1

Approach 2

More articles: