2d histogram creation

I have a data file containing two columns, for example

1.1 2.2 3.1 4.5 1.2 4.5 3.2 4.6 1.1 2.3 4.2 4.9 4.2 1.1 

I would like to make a histogram of two columns, i.e. get this conclusion (if the step size (or the size of the hopper , since we are talking about a histogram) is 0.1 in this case)

 1.0 1.0 0 1.0 1.1 0 1.0 1.2 0 ... 1.1 1.0 0 1.1 1.1 0 1.1 1.2 0 ... 1.1 2.0 0 1.1 2.1 0 1.1 2.2 1 ... ... 

Can anyone tell me something? It would be nice if I could set the range of values ​​for the masts. In the above case, the values ​​of the 1st column go from 1 to 4 and are the same as for the second column.

EDITED: Updated to handle more general data input, e.g. floating numbers. The step size in the above case is 0.1, but it would be nice if it could be configured for other settings, i.e. If the range of steps ( hopper size ) is, for example, 0.2 or 1.0 . If the step size is, for example, 1.0, then if I have 1.1 and 1.8, they have the same bit, we must process them together, for example (the range in this case is 4 for both two columns 0.0. . 4.0)

 1.1 1.8 2.5 2.6 1.4 2.1 1.3 1.5 3.3 4.0 3.8 3.9 4.0 3.2 4.0 4.0 

( if bean size = 1.0 )

 1 1 2 1 2 1 1 3 0 1 4 0 2 1 0 2 2 1 2 3 0 2 4 0 3 1 0 3 2 0 3 3 1 3 4 1 4 1 0 4 2 0 4 3 1 4 4 1 
+4
source share
3 answers
 awk 'END { for (i = 0; ++i <= l;) { for (j = 0; ++j <= l;) printf "%d %d %d %s\n", i, j, \ b[i, j], (j < l ? x : ORS) } } { f[NR] = $1; s[NR] = $2 b[$1, $2]++ }' l=4 infile 

You can try this (not fully tested):

 awk -vl=4 -v bs=0.1 'BEGIN { if (!bs) { print "invalid bin size" > "/dev/stderr" exit } split(bs, t, ".") t[2] || fl++ m = "%." length(t[2]) "f" } { fk = fl ? int($1) : sprintf(m, $1) sk = fl ? int($2) : sprintf(m, $2) f[fk]; s[sk]; b[fk, sk]++ } END { if (!bs) exit 1 for (i = 1; int(i) <= l; i += bs) { for (j = 1; int(j) <= l; j += bs) { if (fl) { fk = int(i); sk = int(j); m = "%d" } else { fk = sprintf(m, i); sk = sprintf(m, j) } printf "%s" m OFS m OFS "%d\n", (i > 1 && fk != p ? ORS : x), fk, sk, b[fk, sk] p = fk } } }' infile 
+2
source

You can try this in bash:

 for x in {1..4} ; do for y in {1..4} ; do echo $x%$y 0 done done \ | join -1 1 -2 2 - -a1 <(sed 's/ /%/' FILE \ | sort \ | uniq -c \ | sort -k2 ) \ | sed 's/ 0 / /;s/%/ /' 

It creates a table with all the zeros in the last column, attaches it to the actual results (the classical frequency table sort | uniq -c ) and removes zeros from the rows where another number should be displayed.

+2
source

One solution in perl (fetching and using a sample for later use):

 #!/usr/bin/perl -W use strict; my ($min, $step, $max, $file) = @ARGV or die "Syntax: $0 <min> <step> <max> <file>\n"; my %seen; open F, "$file" or die "Cannot open file $file: $!\n"; my @l = map { chomp; $_} qx/seq $min $step $max/; foreach my $first (@l) { foreach my $second (@l) { $seen{"$first $second"} = 0; } } foreach my $line (<F>) { chomp $line; $line or next; $seen{$line}++; } my $len = @l; # size of list my $i = 0; foreach my $key (sort keys %seen) { printf("%s %d\n", $key, $seen{$key}); $i++; print "\n" unless $i % $len; } exit(0); 
0
source

Source: https://habr.com/ru/post/1389463/


All Articles