I encoded a function using cython containing the following loop. Each row of array A1 is binary for all values in array A2. Thus, each iteration of the loop returns a 2D array of index values. Arrays A1 and A2 are entered as function arguments, correctly typed.
Array C is pre-allocated at the highest level of indentation, as required by cython.
I simplified this question a bit.
...
cdef np.ndarray[DTYPEint_t, ndim=3] C = np.zeros([N,M,M], dtype=DTYPEint)
for j in range(0,N):
C[j,:,:] = np.searchsorted(A1[j,:], A2, side='left' )
So far so good, things are compiling and executing as expected. However, to get even more speed, I want to parallelize j-loop. The first attempt is to simply write
for j in prange(0,N, nogil=True):
C[j,:,:] = np.searchsorted(A1[j,:], A2, side='left' )
, nogil_function, , C.
" Python gil"
. , ?
EDIT:
setup.py
try:
from setuptools import setup
from setuptools import Extension
except ImportError:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy
extensions = [Extension("matchOnDistanceVectors",
sources=["matchOnDistanceVectors.pyx"],
extra_compile_args=["/openmp", "/O2"],
extra_link_args=[]
)]
setup(
ext_modules = cythonize(extensions),
include_dirs=[numpy.get_include()]
)
Windows 7, msvc. /openmp, 200 * 200. , ...