Can't get a simple CUDA program to work.

Question

Can't get a simple CUDA program to work.

I am trying CUDA's hello world program: adding two vectors together. Here is the program I tried:

#include <cuda.h> #include <stdio.h> #define SIZE 10 __global__ void vecAdd(float* A, float* B, float* C) { int i = threadIdx.x; C[i] = A[i] + B[i]; } int main() { float A[SIZE], B[SIZE], C[SIZE]; float *devPtrA, *devPtrB, *devPtrC; size_t memsize= SIZE * sizeof(float); for (int i=0; i< SIZE; i++) { A[i] = i; B[i] = i; } cudaMalloc(&devPtrA, memsize); cudaMalloc(&devPtrB, memsize); cudaMalloc(&devPtrC, memsize); cudaMemcpy(devPtrA, A, memsize, cudaMemcpyHostToDevice); cudaMemcpy(devPtrB, B, memsize, cudaMemcpyHostToDevice); vecAdd<<<1, SIZE>>>(devPtrA, devPtrB, devPtrC); cudaMemcpy(C, devPtrC, memsize, cudaMemcpyDeviceToHost); for (int i=0; i<SIZE; i++) printf("C[%d]: %f + %f => %f\n",i,A[i],B[i],C[i]); cudaFree(devPtrA); cudaFree(devPtrB); cudaFree(devPtrC); }

Compiled with

 nvcc cuda.cu

Output:

 C[0]: 0.000000 + 0.000000 => 0.000000 C[1]: 1.000000 + 1.000000 => 0.000000 C[2]: 2.000000 + 2.000000 => 0.000000 C[3]: 3.000000 + 3.000000 => 0.000000 C[4]: 4.000000 + 4.000000 => 0.000000 C[5]: 5.000000 + 5.000000 => 0.000000 C[6]: 6.000000 + 6.000000 => 0.000000 C[7]: 7.000000 + 7.000000 => 0.000000 C[8]: 8.000000 + 8.000000 => 366987238703104.000000 C[9]: 9.000000 + 9.000000 => 0.000000

Each time I run it, I get a different answer for C [8], but the results for all other elements are always 0.000000.

Ubuntu 11.04 is a 64-bit Xeon server with 4 cores running the latest NVIDIA drivers (downloaded October 4, 2012). The card is an EVGA GeForce GT 430 with 96 cores and 1 GB of RAM.

What should I do to understand what is happening?

+4

cuda

Barry brown Oct 05 '12 at 20:55

source share

2 answers

It seems your drivers are not initialized, but checking the cuda return code is always bad practice, you should avoid this. Here is a simple + Macro function that you can use for cuda calls (quoted from Cuda as an example):

 static void HandleError( cudaError_t err, const char *file, int line ) { if (err != cudaSuccess) { printf( "%s in %s at line %d\n", cudaGetErrorString( err ), file, line ); exit( EXIT_FAILURE ); } } #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))

Now start calling your functions as follows:

 HANDLE_ERROR(cudaMemcpy(...));

+5

Seçkin Savaşçı Oct 9 '12 at 6:28

source share

Barry brown · Accepted Answer · 2012-10-09T04:50:36+0000

Most likely the reason: NVIDIA drivers were not loaded. Linux X does not run Windows on a headless system, so drivers do not load at boot time.

Run nvidia-smi -a as root to download them and receive a confirmation report.

Although the drivers are now loaded, they still need to be initialized every time the CUDA program starts. Put drivers in constant mode with nvidia-smi -pm 1 so that they are initialized all the time. Add this to your boot script (e.g. rc.local) so that it happens on every boot.

Can't get a simple CUDA program to work.

More articles: