Speed up Python2 loops with XOR

Question

Speed up Python2 loops with XOR

The answer to the question, which is marked as a duplicate, is incorrect and does not satisfy my needs.

My code is designed to calculate a hash from a series of numbers.

It’s easier to understand the structure in the form of a matrix. If I have 16 numbers, starting at 29, the structure will be: (start = 29, length = 4)

29, 30, 31, 32,
33, 34, 35, 36,
37, 38, 39, 40,
41, 42, 43, 44

This algorithm determines that the hash will be XOR numbers in bold:

29, 30, 31, 32, //,
33, 34, 35, //, 36,
37, 38, //, 39, 40,
41, //, 42, 43, 44

Hash = 29^30^31^32^33^34^35^37^38^39 = 54

My code is:

 def answer(start, length): val=0 c=0 for i in range(length): for j in range(length): if j < length-i: val^=start+c c+=1 return val

The time required to compute large values such as answer(2000000000,10**4) is too long.

Limitations:

Py2.7.6
Only standard libraries except bz2, crypt, fcntl, mmap, pwd, pyexpat, select, signal, termios, thread, time, unicodedata, zipimport, zlib.
Limited time to calculate.

Currently calculating test parameters (unknown to me) gives me a timeout error.

How can I improve the speed of my code for large values?

-1

performance python python-2.7 nested-loops xor

sbrm1 Nov 02 '16 at 11:07

source share

3 answers

It looks like you can replace the inner loop, and if:

for j in range(length - i) val^=start+c c+=1 c+=i This should save some time when I get more

I'm afraid I can't check it right now, sorry!

+1

Tiemen joustra Nov 02 '16 at 11:17

source share

I am afraid that with the input that you have in answer(2000000000,10**4) , you will never end "over time."

You can get pretty significant speed by improving the inner loop without updating the c variable each time and using xrange instead of range , for example:

 def answer(start, length): val=0 c=0 for i in range(length): for j in range(length): if j < length-i: val^=start+c c+=1 return val def answer_fast(start, length): val = 0 c = 0 for i in xrange(length): for j in xrange(length - i): if j < length - i: val ^= start + c + j c += length return val # print answer(10, 20000) print answer_fast(10, 20000)

The profiler shows that answer_fast about twice as fast:

 > python -m cProfile script.py 366359392 20004 function calls in 46.696 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 46.696 46.696 script.py:1(<module>) 1 44.357 44.357 46.696 46.696 script.py:1(answer) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 20001 2.339 0.000 2.339 0.000 {range} > python -m cProfile script.py 366359392 3 function calls in 26.274 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 26.274 26.274 script.py:1(<module>) 1 26.274 26.274 26.274 26.274 script.py:12(answer_fast) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

But if you need big accelerations (tycoon orders), you should consider rewriting your function in Cython.

Here is its "cythonized" version:

 def answer(int start, int length): cdef int val = 0, c = 0, i, j for i in xrange(length): for j in xrange(length - i): if j < length - i: val ^= start + c + j c += length return val

With the same input parameters as above, it consumes less than 200 ms in 20 + seconds, which is a 100-fold acceleration.

 > ipython In [1]: import pyximport; pyximport.install() Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x7f3fed983150>) In [2]: import script2 In [3]: timeit script2.answer(10, 20000) 10 loops, best of 3: 188 ms per loop

With your input parameters, 58 ms is required:

 In [5]: timeit script2.answer(2000000000,10**4) 10 loops, best of 3: 58.2 ms per loop

+1

lbolla Nov 02 '16 at 11:40

source share

PM 2Ring · Accepted Answer · 2016-11-03T12:43:29+0000

There is an error in the accepted answer . Python XOR fast algorithm : decrement l must be executed before XOR calculation. Here the version is fixed along with the assert test to make sure that it gives the same result as the naive algorithm.

 def f(a): return (a, 1, a + 1, 0)[a % 4] def getXor(a, b): return f(b) ^ f(a-1) def gen_nums(start, length): l = length ans = 0 while l > 0: l = l - 1 ans ^= getXor(start, start + l) start += length return ans def answer(start, length): c = val = 0 for i in xrange(length): for j in xrange(length - i): n = start + c + j #print '%d,' % n, val ^= n #print c += length return val for start in xrange(50): for length in xrange(100): a = answer(start, length) b = gen_nums(start, length) assert a == b, (start, length, a, b)

In these ranges, start and length , gen_nums about 5 times faster than answer , but we can do it about two times faster (i.e., about 10 times faster than answer ), eliminating these function calls:

 def gen_nums(start, length): ans = 0 for l in xrange(length - 1, -1, -1): b = start + l ans ^= (b, 1, b + 1, 0)[b % 4] ^ (0, start - 1, 1, start, 0)[start % 4] start += length return ans

As Mirek Opoka mentions in the comments, % 4 equivalent to & 3 , and it is faster because bitwise arithmetic is faster than doing integer division and dropping the quotient. Thus, we can replace the main step with

 ans ^= (b, 1, b + 1, 0)[b & 3] ^ (0, start - 1, 1, start, 0)[start & 3]

Speed ​​up Python2 loops with XOR

More articles:

Speed up Python2 loops with XOR