Here is a version that is much faster than the one you provided above, and also uses a simplified formula for the case without weight, to get even faster results in this case.
def gini(x, w=None):
Here is some test code to verify that we get (basically) the same results:
>>> x = np.random.rand(1000000) >>> w = np.random.rand(1000000) >>> gini_max_ghenis(x, w) 0.33376310938610521 >>> gini(x, w) 0.33376310938610382
But the speed is completely different
%timeit gini(x, w) 203 ms Β± 3.68 ms per loop (mean Β± std. dev. of 7 runs, 1 loop each) %timeit gini_max_ghenis(x, w) 55.6 s Β± 3.35 s per loop (mean Β± std. dev. of 7 runs, 1 loop each)
If you remove the pandas functions from the function, it is already much faster:
%timeit gini_max_ghenis_no_pandas_ops(x, w) 1.62 s Β± 75 ms per loop (mean Β± std. dev. of 7 runs, 1 loop each)
If you want to get the latest performance degradation, you can use numba or cython, but this will only bring a few percent, because most of the time is spent sorting.
%timeit ind = np.argsort(x); sx = x[ind]; sw = w[ind] 180 ms Β± 4.82 ms per loop (mean Β± std. dev. of 7 runs, 10 loops each)
edit : gini_max_ghenis - code used in Max Genis answer
source share