Additive Assignment Operator in Cuda C

Question

Additive Assignment Operator in Cuda C

I am having a problem with the addition assignment operator in Cuda C. I get the following error:

kernel.cu(5): error: expression must have integral or enum type

My code is:

 import pycuda.driver as drv import pycuda.autoinit from pycuda.compiler import SourceModule import numpy as np mod=SourceModule(""" __global__ void addition(float* a,float* b,float*c){ int i=threadIdx.x + blockIdx.x * blockDim.x; c[a[i]]+=b[i]; } """) addition=mod.get_function("addition") a=np.array([1,2,3,1,2,3,2,1]).astype(np.float32) b=np.array([0.1,0.2,0.1,0.5,0.1,0.2,0.1,0.5]).astype(np.float32) c=np.zeros_like(a) addition(drv.Out(c),drv.In(a),drv.In(b),block=(32,1,1)) print c

My desired result: c = [0,1,1,0,4,0,3,0,0,0,0,0]. Can anyone suggest a solution?

+4

gpgpu cuda gpu-programming pycuda

Daniel Wonglee Nov 12 '12 at 9:25

source share

1 answer

brano · Accepted Answer · 2012-11-12T09:30:49+0000

the problem is in your kernel where you index in C with A.
A is of type float.

Also note that you start 32 threads, but you will only index at 8 positions, which means that you will index beyond.

The last problem you will encounter is that multiple threads are trying to change the same position in C due to duplicate indices in a. One way to fix this is to use AtomicAdd.

__global__ void addition(float* a,float* b,float*c, int n) { int i=threadIdx.x + blockIdx.x * blockDim.x; if(i < n) atomicAdd(&c[(int)a[i]],b[i]); }

Run the kernel in the same way, but remember to pass n, which is the size of a or b.
You can also eliminate n and resize the threadblock block at kernel startup.

Additive Assignment Operator in Cuda C

More articles: