Kurtosis, histogram asymmetry? - Python

Question

Kurtosis, histogram asymmetry? - Python

What is an efficient method for determining skew / kurtosis of a histogram in python? Given that the histograms are not clogged (unlike the histograms), this question does not make much sense, but what I'm trying to do is to determine the symmetry of the height of the graph compared to the distance (and not by frequency versus bins). In other words, given the value of the heights (y) measured along the distance (x), i.e.

y = [6.18, 10.23, 33.15, 55.25, 84.19, 91.09, 106.6, 105.63, 114.26, 134.24, 137.44, 144.61, 143.14, 150.73, 156.44, 155.71, 145.88, 120.77, 99.81, 85.81, 55.81, 49.81, 37.81, 25.81, 5.81] x = [0.03, 0.08, 0.14, 0.2, 0.25, 0.31, 0.36, 0.42, 0.48, 0.53, 0.59, 0.64, 0.7, 0.76, 0.81, 0.87, 0.92, 0.98, 1.04, 1.09, 1.15, 1.2, 1.26, 1.32, 1.37]

What is the symmetry of the height distribution (y) (asymmetry) and the peak (excess) measured at a distance (x)? Are asymmetries / excesses appropriate measurements to determine the normal distribution of real values? Or does scipy / numpy offer something similar for this type of measurement?

I can get a skew / kurtosis estimate of the height (y) values aligned along the distance (x) as follows

 freq=list(chain(*[[x_v]*int(round(y_v)) for x_v,y_v in zip(x,y)])) x.extend([x[-1:][0]+x[0]]) #add one extra bin edge hist(freq,bins=x) ylabel("Height Frequency") xlabel("Distance(km) Bins") print "Skewness,","Kurtosis:",stats.describe(freq)[4:] Skewness, Kurtosis: (-0.019354300509997705, -0.7447085398785758)

Histogram

In this case, the height distribution is symmetrical (skew 0.02) around the average distance and is characterized by a platurktic (-0.74 kurtiza, i.e. wide) distribution.

Given that I multiply each x value by their y height to create a frequency, the size of the list of results can sometimes be very large. I was wondering if there is a better method to solve this problem? I believe that I could always try to normalize the dataset y to a range, possibly 0–100, without losing too much information about the distortions / excesses of the datasets.

+4

python numpy scipy

Bjebn Jul 11 '13 at 8:01

source share

1 answer

Hooked · Accepted Answer · 2013-07-11T14:10:17+0000

This is not a python question, and it is not a programming question, but the answer is nonetheless simple. Instead of skewing and excesses, we first consider simpler values based on lower points, which means standard deviation . To make it specific and fit your question, suppose your data looks like this:

 X = 3, 3, 5, 5, 5, 7 = x1, x2, x3 ....

What will give a "histogram", which looks like this:

 {3:2, 5:3, 7:1} = {k1:p1, k2:p2, k3:p3}

The average value of u is given by

 E[X] = (1/N) * (x1 + x2 + x3 + ...) = (1/N) * (3 + 3 + 5 + ...)

Our data, however, has duplicate values, so they can be rewritten as

 E[X] = (1/N) * (p1*k1 + p2*k2 + ...) = (1/N) * (3*2 + 5*3 + 7*1)

The next term, standard dev., S, is simply

 sqrt(E[(Xu)^2]) = sqrt((1/N)*( (x1-u)^2 + (x2-u)^3 + ...))

But we can apply the same reduction to the term E[(Xu)^2] and write it as

 E[(Xu)^2] = (1/N)*( p1*(k1-u)^2 + p2*(k2-u)^2 + ... ) = (1/6)*( 2*(3-u)^2 + 3*(5-u)^2 + 1*(7-u)^2 )

This means that we do not need to have multiple copies of each data item in order to make the amount indicated in your question.

skew and kurtosis is quite simple:

 skew = E[(xu)^3] / (E[(xu)^2])^(3/2) kurtosis = ( E[(xu)^4] / (E[(xu)^2])^2 ) - 3

Kurtosis, histogram asymmetry? - Python

More articles: