2d histogram creation

Question

2d histogram creation

I have a data file containing two columns, for example

1.1 2.2 3.1 4.5 1.2 4.5 3.2 4.6 1.1 2.3 4.2 4.9 4.2 1.1

I would like to make a histogram of two columns, i.e. get this conclusion (if the step size (or the size of the hopper , since we are talking about a histogram) is 0.1 in this case)

 1.0 1.0 0 1.0 1.1 0 1.0 1.2 0 ... 1.1 1.0 0 1.1 1.1 0 1.1 1.2 0 ... 1.1 2.0 0 1.1 2.1 0 1.1 2.2 1 ... ...

Can anyone tell me something? It would be nice if I could set the range of values for the masts. In the above case, the values of the 1st column go from 1 to 4 and are the same as for the second column.

EDITED: Updated to handle more general data input, e.g. floating numbers. The step size in the above case is 0.1, but it would be nice if it could be configured for other settings, i.e. If the range of steps ( hopper size ) is, for example, 0.2 or 1.0 . If the step size is, for example, 1.0, then if I have 1.1 and 1.8, they have the same bit, we must process them together, for example (the range in this case is 4 for both two columns 0.0. . 4.0)

 1.1 1.8 2.5 2.6 1.4 2.1 1.3 1.5 3.3 4.0 3.8 3.9 4.0 3.2 4.0 4.0

( if bean size = 1.0 )

 1 1 2 1 2 1 1 3 0 1 4 0 2 1 0 2 2 1 2 3 0 2 4 0 3 1 0 3 2 0 3 3 1 3 4 1 4 1 0 4 2 0 4 3 1 4 4 1

+4

linux bash shell awk

user1116360 Jan 05 '12 at 13:16

source share

3 answers

You can try this in bash:

 for x in {1..4} ; do for y in {1..4} ; do echo $x%$y 0 done done \ | join -1 1 -2 2 - -a1 <(sed 's/ /%/' FILE \ | sort \ | uniq -c \ | sort -k2 ) \ | sed 's/ 0 / /;s/%/ /'

It creates a table with all the zeros in the last column, attaches it to the actual results (the classical frequency table sort | uniq -c ) and removes zeros from the rows where another number should be displayed.

+2

choroba Jan 05 '12 at 13:51

source share

One solution in perl (fetching and using a sample for later use):

 #!/usr/bin/perl -W use strict; my ($min, $step, $max, $file) = @ARGV or die "Syntax: $0 <min> <step> <max> <file>\n"; my %seen; open F, "$file" or die "Cannot open file $file: $!\n"; my @l = map { chomp; $_} qx/seq $min $step $max/; foreach my $first (@l) { foreach my $second (@l) { $seen{"$first $second"} = 0; } } foreach my $line (<F>) { chomp $line; $line or next; $seen{$line}++; } my $len = @l; # size of list my $i = 0; foreach my $key (sort keys %seen) { printf("%s %d\n", $key, $seen{$key}); $i++; print "\n" unless $i % $len; } exit(0);

0

fge Jan 05 '12 at 13:50

source share

Dimitre radoulov · Accepted Answer · 2012-01-05T13:58:19+0000

 awk 'END { for (i = 0; ++i <= l;) { for (j = 0; ++j <= l;) printf "%d %d %d %s\n", i, j, \ b[i, j], (j < l ? x : ORS) } } { f[NR] = $1; s[NR] = $2 b[$1, $2]++ }' l=4 infile

You can try this (not fully tested):

 awk -vl=4 -v bs=0.1 'BEGIN { if (!bs) { print "invalid bin size" > "/dev/stderr" exit } split(bs, t, ".") t[2] || fl++ m = "%." length(t[2]) "f" } { fk = fl ? int($1) : sprintf(m, $1) sk = fl ? int($2) : sprintf(m, $2) f[fk]; s[sk]; b[fk, sk]++ } END { if (!bs) exit 1 for (i = 1; int(i) <= l; i += bs) { for (j = 1; int(j) <= l; j += bs) { if (fl) { fk = int(i); sk = int(j); m = "%d" } else { fk = sprintf(m, i); sk = sprintf(m, j) } printf "%s" m OFS m OFS "%d\n", (i > 1 && fk != p ? ORS : x), fk, sk, b[fk, sk] p = fk } } }' infile

2d histogram creation

More articles: