Sort by number of repetitions

Question

Sort by number of repetitions

I have a file that looks like this:

192.168.2.2 150.25.45.7 8080 192.168.12.25 178.25.45.7 50 192.168.2.2 142.55.45.18 369 192.168.489.2 122.25.35.7 8080 192.168.489.2 90.254.45.7 80 192.168.2.2 142.55.45.18 457

I made up all the numbers.

I need to sort all these files according to the number of repetitions of the first ip. Thus, the result ideally looks like this:

 192.168.2.2 8080 369 457 3 192.168.489.2 8080 80 2 192.168.12.25 50 1

So: first ip, all ports that were in lines with the first ip, and the number of retries.

I tried to play with the sort and awk command, but I don’t want to do the extra work and maybe I am not missing out on some other direct solution.

Any idea? Thanks:)

+4

shell awk perl

coconut Dec 15 '11 at 10:27

source share

6 answers

Perl Method:

 #!/usr/bin/perl use strict; use warnings; my %repeat; while(<DATA>) { if (/^(\d+(?:.\d+){3})\s\S+\s(\d+)$/) { push @{$repeat{$1}}, $2; } } foreach (sort {@{$repeat{$b}}<=>@{$repeat{$a}}} keys %repeat) { my $num = @{$repeat{$_}}; print "$_ @{$repeat{$_}} $num\n"; } __DATA__ 192.168.2.2 150.25.45.7 8080 192.168.12.25 178.25.45.7 50 192.168.2.2 142.55.45.18 369 192.168.489.2 122.25.35.7 8080 192.168.489.2 90.254.45.7 80 192.168.2.2 142.55.45.18 457

output:

 192.168.2.2 8080 369 457 3 192.168.489.2 8080 80 2 192.168.12.25 50 1

+2

Toto Dec 15 '11 at 10:38

source share

Another solution for Perl:

 #!/usr/bin/perl use strict; use warnings; my %ips; push @{$ips{$_->[0]}}, $_->[1]+0 for map{[split/ \S+ /]}<DATA>; for (sort {@{$ips{$b}} <=> @{$ips{$a}}} keys %ips) { printf("%s %s %d\n", $_, join(" ", @{$ips{$_}}), 0+@ {$ips{$_}}); } __DATA__ 192.168.2.2 150.25.45.7 8080 192.168.12.25 178.25.45.7 50 192.168.2.2 142.55.45.18 369 192.168.489.2 122.25.35.7 8080 192.168.489.2 90.254.45.7 80 192.168.2.2 142.55.45.18 457

Output:

 192.168.2.2 8080 369 457 3 192.168.489.2 8080 80 2 192.168.12.25 50 1

+1

flesk Dec 15 '11 at 11:15

source share

this line should do the job for you:

 awk '{a[$1]++;b[$1]=b[$1]" "$3}END{for(x in a)print a[x]"\t"x,b[x],a[x]}' input | sort -nr|cut -f2-

test with your example

 kent$ cat tt 192.168.2.2 150.25.45.7 8080 192.168.12.25 178.25.45.7 50 192.168.2.2 142.55.45.18 369 192.168.489.2 122.25.35.7 8080 192.168.489.2 90.254.45.7 80 192.168.2.2 142.55.45.18 457 kent$ awk '{a[$1]++;b[$1]=b[$1]" "$3}END{for(x in a)print a[x]"\t"x,b[x],a[x]}' tt | sort -nr|cut -f2- 192.168.2.2 8080 369 457 3 192.168.489.2 8080 80 2 192.168.12.25 50 1

0

Kent Dec 15 '11 at 11:06

source share

Here's a pipeline relying mainly on awk and sort:

 sort -k1 -k3n \ | awk -F' ' ' NR==1 { printf("%s ", $1); current = $1 } $1 != current { printf(":%d\n%s ", count, $1); current = $1; count = 0 } { printf("%d ", $3); count++ } END { printf(":%d\n", count) }' \ | sort -t':' -k2nr \ | tr -d':'

0

Michael J. Barber Dec 15 '11 at 11:37

source share

GNU awk 4:

 awk 'END { PROCINFO["sorted_in"] = "@val_num_desc" for (e in ic) print e, ip[e], ic[e] } { ip[$1] = $1 in ip ? ip[$1] OFS $NF : $NF ic[$1]++ }' infile

0

Dimitre radoulov Dec 15 '11 at 2:04

source share

Dave cross · Accepted Answer · 2011-12-15T10:51:49+0000

Perlish's answer would look something like this.

 #!/usr/bin/perl use strict; use warnings; use 5.010; my %data; # Store IP address and port number while (<DATA>) { chomp; my ($ip, undef, $port) = split; push @{$data{$ip}}, $port; } # Sort (in reverse) by length of list of ports for (sort { @{$data{$b}} <=> @{$data{$a}} } keys %data) { say "$_ @{$data{$_}} ", scalar @{$data{$_}}; } __DATA__ 192.168.2.2 150.25.45.7 8080 192.168.12.25 178.25.45.7 50 192.168.2.2 142.55.45.18 369 192.168.489.2 122.25.35.7 8080 192.168.489.2 90.254.45.7 80 192.168.2.2 142.55.45.18 457

Output:

 192.168.2.2 8080 369 457 3 192.168.489.2 8080 80 2 192.168.12.25 50 1

Sort by number of repetitions

More articles: