Speed ​​up Python2 loops with XOR

The answer to the question, which is marked as a duplicate, is incorrect and does not satisfy my needs.

My code is designed to calculate a hash from a series of numbers.

It’s easier to understand the structure in the form of a matrix. If I have 16 numbers, starting at 29, the structure will be: (start = 29, length = 4)

29, 30, 31, 32,
33, 34, 35, 36,
37, 38, 39, 40,
41, 42, 43, 44

This algorithm determines that the hash will be XOR numbers in bold:

29, 30, 31, 32, //,
33, 34, 35, //, 36,
37, 38, //, 39, 40,
41, //, 42, 43, 44

Hash = 29^30^31^32^33^34^35^37^38^39 = 54


My code is:

 def answer(start, length): val=0 c=0 for i in range(length): for j in range(length): if j < length-i: val^=start+c c+=1 return val 

The time required to compute large values ​​such as answer(2000000000,10**4) is too long.


Limitations:

  • Py2.7.6
  • Only standard libraries except bz2, crypt, fcntl, mmap, pwd, pyexpat, select, signal, termios, thread, time, unicodedata, zipimport, zlib.
  • Limited time to calculate.

Currently calculating test parameters (unknown to me) gives me a timeout error.


How can I improve the speed of my code for large values?

-1
source share
3 answers

There is an error in the accepted answer . Python XOR fast algorithm : decrement l must be executed before XOR calculation. Here the version is fixed along with the assert test to make sure that it gives the same result as the naive algorithm.

 def f(a): return (a, 1, a + 1, 0)[a % 4] def getXor(a, b): return f(b) ^ f(a-1) def gen_nums(start, length): l = length ans = 0 while l > 0: l = l - 1 ans ^= getXor(start, start + l) start += length return ans def answer(start, length): c = val = 0 for i in xrange(length): for j in xrange(length - i): n = start + c + j #print '%d,' % n, val ^= n #print c += length return val for start in xrange(50): for length in xrange(100): a = answer(start, length) b = gen_nums(start, length) assert a == b, (start, length, a, b) 

In these ranges, start and length , gen_nums about 5 times faster than answer , but we can do it about two times faster (i.e., about 10 times faster than answer ), eliminating these function calls:

 def gen_nums(start, length): ans = 0 for l in xrange(length - 1, -1, -1): b = start + l ans ^= (b, 1, b + 1, 0)[b % 4] ^ (0, start - 1, 1, start, 0)[start % 4] start += length return ans 

As Mirek Opoka mentions in the comments, % 4 equivalent to & 3 , and it is faster because bitwise arithmetic is faster than doing integer division and dropping the quotient. Thus, we can replace the main step with

 ans ^= (b, 1, b + 1, 0)[b & 3] ^ (0, start - 1, 1, start, 0)[start & 3] 
+4
source

It looks like you can replace the inner loop, and if:

for j in range(length - i) val^=start+c c+=1 c+=i This should save some time when I get more

I'm afraid I can't check it right now, sorry!

+1
source

I am afraid that with the input that you have in answer(2000000000,10**4) , you will never end "over time."

You can get pretty significant speed by improving the inner loop without updating the c variable each time and using xrange instead of range , for example:

 def answer(start, length): val=0 c=0 for i in range(length): for j in range(length): if j < length-i: val^=start+c c+=1 return val def answer_fast(start, length): val = 0 c = 0 for i in xrange(length): for j in xrange(length - i): if j < length - i: val ^= start + c + j c += length return val # print answer(10, 20000) print answer_fast(10, 20000) 

The profiler shows that answer_fast about twice as fast:

 > python -m cProfile script.py 366359392 20004 function calls in 46.696 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 46.696 46.696 script.py:1(<module>) 1 44.357 44.357 46.696 46.696 script.py:1(answer) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 20001 2.339 0.000 2.339 0.000 {range} > python -m cProfile script.py 366359392 3 function calls in 26.274 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 26.274 26.274 script.py:1(<module>) 1 26.274 26.274 26.274 26.274 script.py:12(answer_fast) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 

But if you need big accelerations (tycoon orders), you should consider rewriting your function in Cython.

Here is its "cythonized" version:

 def answer(int start, int length): cdef int val = 0, c = 0, i, j for i in xrange(length): for j in xrange(length - i): if j < length - i: val ^= start + c + j c += length return val 

With the same input parameters as above, it consumes less than 200 ms in 20 + seconds, which is a 100-fold acceleration.

 > ipython In [1]: import pyximport; pyximport.install() Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x7f3fed983150>) In [2]: import script2 In [3]: timeit script2.answer(10, 20000) 10 loops, best of 3: 188 ms per loop 

With your input parameters, 58 ms is required:

 In [5]: timeit script2.answer(2000000000,10**4) 10 loops, best of 3: 58.2 ms per loop 
+1
source

Source: https://habr.com/ru/post/1257922/


All Articles