Calculation of the number of k-combinations with and without SciPy

Question

Calculation of the number of k-combinations with and without SciPy

I am puzzled by the fact that the SciPy comb function looks slower than the naive Python implementation. This is the measured time for two equivalent programs, solution Problem 53 Project Euler .

With SciPy:

 %%timeit from scipy.misc import comb result = 0 for n in range(1, 101): for k in range(1, n + 1): if comb(n, k) > 1000000: result += 1 result

Conclusion:

 1 loops, best of 3: 483 ms per loop

Without SciPy:

 %%timeit from math import factorial def comb(n, k): return factorial(n) / factorial(k) / factorial(n - k) result = 0 for n in range(1, 101): for k in range(1, n + 1): if comb(n, k) > 1000000: result += 1 result

Conclusion:

 10 loops, best of 3: 86.9 ms per loop

The second version is about 5 times faster (tested on two Mac computers, python-2.7.9-1, IPython 2.3.1-py27_0). Is this the expected behavior of the comb SciPy function ( source code )? What am I doing wrong?

Edit (SciPy from Anaconda 3.7.3-py27_0 distribution):

 import scipy; print scipy.version.version 0.12.0

Edit (same difference outside of IPython):

 $ time python with_scipy.py real 0m0.700s user 0m0.610s sys 0m0.069s $ time python without_scipy.py real 0m0.134s user 0m0.099s sys 0m0.015s

+5

python benchmarking scipy combinations

Aristide Dec 17 '14 at 17:27

source share

3 answers

It appears that you may not properly perform timings and measure the time it takes to load scipy into memory. When I run:

 import timeit from scipy.misc import comb from math import factorial def comb2(n, k): return factorial(n) / factorial(k) / factorial(n - k) def test(): result = 0 for n in range(1, 101): for k in range(1, n + 1): if comb(n, k) > 1000000: result += 1 def test2(): result = 0 for n in range(1, 101): for k in range(1, n + 1): if comb2(n, k) > 1000000: result += 1 T = timeit.Timer(test) print T.repeat(3,50) T2 = timeit.Timer(test2) print T2.repeat(3,50)

I get:

 [2.2370951175689697, 2.2209839820861816, 2.2142510414123535] [2.136591911315918, 2.138144016265869, 2.1437559127807617]

which indicates that scipy is slightly faster than the python version.

+7

Hooked Dec 17 '14 at 18:48

source share

I believe this is faster than other pure python methods:

 from math import factorial from operator import mul def n_choose_k(n, k): if k < nk: return reduce(mul, xrange(n-k+1, n+1)) // factorial(k) else: return reduce(mul, xrange(k+1, n+1)) // factorial(nk)

It is slow compared to numpy solutions, but note that NumPy does not work in "unlimited integers" like Python. This means that with slow Python, you can return the correct results. NumPy's default behavior is not always ideal for combinatorics (at least when very large integers are involved).

eg.

 In [1]: from scipy.misc import comb In [2]: from scipy.special import binom In [3]: %paste from math import factorial from operator import mul def n_choose_k(n, k): if k < nk: return reduce(mul, xrange(n-k+1, n+1)) // factorial(k) else: return reduce(mul, xrange(k+1, n+1)) // factorial(nk) ## -- End pasted text -- In [4]: n_choose_k(10, 3), binom(10, 3), comb(10, 3) Out[4]: (120, 120.0, 120.0) In [5]: n_choose_k(1000, 250) == factorial(1000)//factorial(250)//factorial(750) Out[5]: True In [6]: abs(comb(1000, 250) - n_choose_k(1000, 250)) Out[6]: 3.885085558125553e+230 In [7]: abs(binom(1000, 250) - n_choose_k(1000, 250)) Out[7]: 3.885085558125553e+230

Not surprisingly, the error is so large that it is simply a truncation effect. Why cut back? NumPy is limited to using 64 bits to represent an integer. However, the requirement for such a large whole is somewhat larger:

 In [8]: from math import log In [9]: log((n_choose_k(1000, 250)), 2) Out[9]: 806.1764820287578 In [10]: type(binom(1000, 250)) Out[10]: numpy.float64

+1

Andyp Mar 15 '17 at 8:26

source share

Aristide · Accepted Answer · 2014-12-17T19:19:11+0000

Answering my question. It seems that there are two different functions for the same object in SciPy. I don’t quite understand why. But the other, binom , is 8.5 times faster than comb , and 1.5 times faster than mine, which is encouraging:

 %%timeit from scipy.special import binom result = 0 for n in range(1, 101): for k in range(1, n + 1): if binom(n, k) > 1000000: result += 1 result

Conclusion:

 10 loops, best of 3: 56.3 ms per loop

SciPy 0.14.0 guys, does this work for you too?

Calculation of the number of k-combinations with and without SciPy

With SciPy:

Without SciPy:

More articles: