For average, average and standard deviation you can use awk . This will usually be faster than R solutions R For example, the following will print the average:
awk '{a+=$1} END{print a/NR}' myfile
( NR is the awk variable for the number of entries, $1 means the first (space-separated) argument of the string ( $0 will be the whole string, which will also work here, but in principle it will be less secure, although it will probably just accept for calculation anyway first argument), and END means that the following commands will be executed after processing the entire file (you can also initialize a to 0 in the BEGIN{a=0} operator BEGIN{a=0} ).
Here is a simple awk script that provides more detailed statistics (accepts a CSV file as input, otherwise modifies the FS ):
#!/usr/bin/awk -f BEGIN { FS=","; } { a += $1; b[++i] = $1; } END { m = a/NR; # mean for (i in b) { d += (b[i]-m)^2; e += (b[i]-m)^3; f += (b[i]-m)^4; } va = d/NR; # variance sd = sqrt(va); # standard deviation sk = (e/NR)/sd^3; # skewness ku = (f/NR)/sd^4-3; # standardized kurtosis print "N,sum,mean,variance,std,SEM,skewness,kurtosis" print NR "," a "," m "," va "," sd "," sd/sqrt(NR) "," sk "," ku }
Adding min / max to this scenario is easy, but sort and head / tail are just as simple:
sort -n myfile | head -n1 sort -n myfile | tail -n1
Skippy le Grand Gourou Mar 20 2018-12-12T00: 00Z
source share