Column-based sorting and deletion

I have a text file:

$ cat text 542,8,1,418,1 542,9,1,418,1 301,34,1,689070,1 542,9,1,418,1 199,7,1,419,10 

I would like to sort the file based on the first column and remove duplicates using sort , but everything is not going as expected.

Approach 1

 $ sort -t, -u -b -k1n text 542,8,1,418,1 542,9,1,418,1 199,7,1,419,10 301,34,1,689070,1 

It is not sorted based on the first column.

Approach 2

 $ sort -t, -u -b -k1n,1n text 199,7,1,419,10 301,34,1,689070,1 542,8,1,418,1 

It deletes the line 542,9,1,418,1 , but I want to keep one copy.

It seems that the first approach removes duplicates, but does not sort correctly, while the second sort correctly, but removes more than I want. How do I get the correct result?

+6
source share
2 answers

The problem is that when providing a key before sort unique occurrences look for that particular field. Since line 542,8,1,418,1 is 542,8,1,418,1 , sort sees the next two lines starting with 542 as duplicates and filters them out.

It is best to either sort all the columns:

 sort -t, -nk1,1 -nk2,2 -nk3,3 -nk4,4 -nk5,5 -u text 

or

use awk to filter duplicate lines and swipe it to sort .

 awk '!_[$0]++' text | sort -t, -nk1,1 
+3
source

When sorting by key, you must also indicate the end of the key, otherwise sorting also uses all of the following keys.

The following should work:

 sort -t, -u -k1,1n text 
0
source

Source: https://habr.com/ru/post/950221/


All Articles