Local parallel computing for a summing operation

I started messing around with parallel programming and cython / openmp, and I have a simple program that sums over an array using prange:

import numpy as np
from cython.parallel import prange
from cython import boundscheck, wraparound

@boundscheck(False)
@wraparound(False)

def parallel_summation(double[:] vec):

    cdef int n = vec.shape[0]
    cdef double total
    cdef int i

    for i in prange(n, nogil=True):
        total += vec[i]

    return total

It seems to work fine with the setup.py file. However, I was wondering if it is possible to configure this feature and have a little more control over what processors do.

Let's say I have 4 processors: I want to split a vector that will be summed into 4 parts, and then each processor locally add elements inside. Then, at the end, I can combine the results from each processor to get the total. From the cython documentation, I was not able to figure out if something like this is possible or not (the documentation is a bit sparse).

, - , / - cython/openmp, , , ( ).

+4
1

, 4 , . , , , .

, . Cython inplace, . OpenMP ( ) total total .

C :

#pragma omp parallel
{
    #pragma omp for firstprivate(__pyx_v_i) lastprivate(__pyx_v_i) reduction(+:__pyx_v_total)
    for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_3; __pyx_t_2++){
        {
            __pyx_v_i = (int)(0 + 1 * __pyx_t_2);
            __pyx_t_4 = __pyx_v_i;
            __pyx_v_total = (__pyx_v_total + (*((double *) ( /* dim=0 */ (__pyx_v_vec.data + __pyx_t_4 * __pyx_v_vec.strides[0]) ))));
        }
    }
}

OpenMP .

, , total = 0, C .

+1

Source: https://habr.com/ru/post/1670255/


All Articles