I tried to develop a small CUDA program to find the maximum value in a given array,
int input_data[0...50] = 1,2,3,4,5....,50
max_value , initialized with the first value input_data[0] , The final answer is stored in result[0] . The kernel gives 0 as the maximum value. I do not know what's the problem. I performed 1 thread 50.
__device__ int lock=0; __global__ void max(float *input_data,float *result) { float max_value = input_data[0]; int tid = threadIdx.x; if( input_data[tid] > max_value) { do{} while(atomicCAS(&lock,0,1)); max_value=input_data[tid]; __threadfence(); lock=0; } __syncthreads(); result[0]=max_value;
Despite the fact that there are built-in functions, I just deal with small problems.
source share