Speed โ€‹โ€‹up cython loop by copying from list to numpy array

I am writing some kind of high-performance code and was hoping to get feedback from cythonists on how to improve it further. The purpose of the functions I wrote is a little difficult to explain, but what they do is not so scary. The first (roughly) takes two dictionaries of lists of numbers and concatenates them to get one dictionary of lists of numbers. It only works once, so I'm less concerned about optimizing it. The second first calls the first, then uses its result to basically cross the indexes stored in the numpy array with the numbers in the lists of arrays to form queries (new numbers) on the flowering filter pybloomfiltermmap.

I determined that the hard step was connected with my nested loops and reduced the number of loops used, supplanted everything that should happen once from the loops and scored everything as far as I know. However, each iteration I in the second function takes about 10 seconds, which is too much. The main things that I still see yellow in the output of the html compilation are related to indexed hits in lists and a numpy array, so I tried replacing my lists with all numpy arrays, but could not achieve any improvements. I would really appreciate any feedback you could provide.

#cython: boundscheck=False #cython: wraparound=False import numpy as np cimport numpy as np def merge_dicts_of_lists(dict c1, dict c2): cdef dict res cdef int n, length1, length2, length3 cdef unsigned int i, j, j_line, jj, k, kk, new_line res = {n: [] for n in range(256)} length1 = len(c1) for i in range(length1): length2 = len(c1[i]) for j in range(length2): j_line = c1[i][j] jj = (j_line) % 256 length3 = len(c2[jj]) for k in range(length3): kk = c2[jj][k] new_line = (j_line << 10) + kk res[i].append(new_line) return res def get_4kmer_set(np.ndarray c1, dict c2, dict c3, bf): cdef unsigned int num = 0 cdef unsigned long long query = 0 cdef unsigned int i, j, i_row, i_col, j_line cdef unsigned int length1, length2 cdef dict merge cdef list m_i merge = merge_dicts_of_lists(c2, c3) length1 = len(c1[:,0]) for i in range(length1): print "i is %d" % i i_row = c1[i,0] i_col = c1[i,1] m_i = merge[i_col] length2 = len(m_i) for j in range(length2): j_line = m_i[j] query = (i_row << 24) + (i_col << 20) + j_line if query in bf: num += 1 print "%d yes answers from bf" % num 
+4
source share
1 answer

For posterity, I am adding an off-topic answer, but I hope this is useful to someone nonetheless. The code I wrote above is not much different from what I decided to stay, because it already compiled short C lines, as can be seen from the output of the Ctyhon html compilation.

Since the innermost operation was a Bloom filter request, I found what helped most to speed up this step in two ways. One of them changed the hash function used by pybloomfiltermmap to an available C ++ implementation for murmurhash3. I found that pybloomfilter uses sha, which was relatively slow, expected for a cryptographic hash function. The second impulse came from applying the trick found in this article: http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/rsa.pdf . Basically, it says that you can save a lot of computation by using a linear combination of two hash values โ€‹โ€‹instead of k different hashes for BF. These two tricks together gave an order (~ 5x) of improvement during the request.

+2
source

Source: https://habr.com/ru/post/1493920/


All Articles