Additive Assignment Operator in Cuda C

I am having a problem with the addition assignment operator in Cuda C. I get the following error:

kernel.cu(5): error: expression must have integral or enum type 

My code is:

 import pycuda.driver as drv import pycuda.autoinit from pycuda.compiler import SourceModule import numpy as np mod=SourceModule(""" __global__ void addition(float* a,float* b,float*c){ int i=threadIdx.x + blockIdx.x * blockDim.x; c[a[i]]+=b[i]; } """) addition=mod.get_function("addition") a=np.array([1,2,3,1,2,3,2,1]).astype(np.float32) b=np.array([0.1,0.2,0.1,0.5,0.1,0.2,0.1,0.5]).astype(np.float32) c=np.zeros_like(a) addition(drv.Out(c),drv.In(a),drv.In(b),block=(32,1,1)) print c 

My desired result: c = [0,1,1,0,4,0,3,0,0,0,0,0]. Can anyone suggest a solution?

+4
source share
1 answer

the problem is in your kernel where you index in C with A.
A is of type float.

Also note that you start 32 threads, but you will only index at 8 positions, which means that you will index beyond.

The last problem you will encounter is that multiple threads are trying to change the same position in C due to duplicate indices in a. One way to fix this is to use AtomicAdd.

__global__ void addition(float* a,float* b,float*c, int n)
{
int i=threadIdx.x + blockIdx.x * blockDim.x;
if(i < n)
atomicAdd(&c[(int)a[i]],b[i]);
}

Run the kernel in the same way, but remember to pass n, which is the size of a or b.
You can also eliminate n and resize the threadblock block at kernel startup.

+1
source

Source: https://habr.com/ru/post/1445467/


All Articles