How to sort alphanumeric characters in Unix with sorting? More complicated than it sounds

I'm trying to sort a string of letters and numbers in the alphanumeric way in an "intuitive" / natural way with the unix sort command, but I can't get it to sort correctly. I have this file:

 $ cat ~/headers @42EBKAAXX090828:6:100:1699:328/2 @42EBKAAXX090828:6:10:1077:1883/2 @42EBKAAXX090828:6:102:785:808/2 

I would like to sort it in an alphanumeric way, where first intuitively @42EBKAAXX090828:6:10:... (since 10 less than 100 and 102 ), the second is @42EBKAAXX090828:6:100... and the third is @42EBKAAXX090828:6:102:204:1871/2 .

I know that I propose sorting by a certain position inside the line, but the position : it can change here, and therefore this will not be a general and feasible solution here.

I tried:

 sort --stable -k1,1 ~/headers > foo 

with various combinations of the -n and -u options, but does not give the correct order.

How can this be done efficiently, either from bash using sort , or from Python? I would like to apply this to 4-5 GB files, so they contain millions of lines.

Thanks!

+4
source share
3 answers

the -V option seems to do what you want - natural sorting. Designed for version numbers, apparently (therefore a letter is selected)

 sort -V ~/headers 

exits

 @42EBKAAXX090828:6:10:1077:1883/2 @42EBKAAXX090828:6:100:1699:328/2 @42EBKAAXX090828:6:102:785:808/2 
+11
source

It sorts it alphabetically as it is in your example. Reason 10: starts after 100 and 102 is because after them 10: because the colon : after character 9 in the ASCII diagram .

If you want to sort by the third field marked with a colon, try the following:

 sort -t':' -k3 ~/headers > foo 
+4
source

This is usually called natural sorting. Here is one way that works for your sample dataset.

 import re def natural_sorted(iterable, reverse=False): """Return a list sorted the way that humans expect.""" def convert(text): return int(text) if text.isdigit() else text def natural(item): return map(convert, re.split('([0-9]+)', item)) return sorted(iterable, key=natural, reverse=reverse) 

I found it here and improved a bit.

0
source

Source: https://habr.com/ru/post/1384830/


All Articles