I just started to learn CUDA programming. I made my way through some simple CUDA C examples and everything went smoothly. Then! All of a sudden! Persistent! I believe that I understand C ++ functions and was stunned by the difference between CUDA C and Thrust
I find it hard to believe that
__global__ void square(float *a, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < N) { a[idx] = a[idx] * a[idx]; } } int main(int argc, char** argv) { float *aHost, *aDevice; const int N = 10; size_t size = N * sizeof(float); aHost = (float*)malloc(size); cudaMalloc((void**)&aDevice, size); for (int i = 0; i < N; i++) { aHost[i] = (float)i; } cudaMemcpy(aDevice, aHost, size, cudaMemcpyHostToDevice); int block = 4; int nBlock = N/block + (N % block == 0 ? 0:1); square<<<nBlock, block>>>(aDevice, N); cudaMemcpy(aHost, aDevice, size, cudaMemcpyDeviceToHost); for (int i = 0; i < N; i++) { printf("%d, %f\n", i, aHost[i]); } free(aHost); cudaFree(aDevice); }
is equal
template <typename T> struct square { __host__ __device__ T operator()(const T& x) const { return x * x; } }; int main(int argc, char** argv) { const int N = 10; thrust::device_vector<float> dVec(N); thrust::sequence(dVec.begin(), dVec.end()); thrust::transform(dVec.begin(), dVec.end(), dVec.begin(), square<float>()); thrust::copy(dVec.begin(), dVec.end(), std::ostream_iterator<float>(std::cout, "\n")); }
Am I missing something? Is the above code running on the GPU? Thrust is a great tool, but I'm skeptical that it takes care of all the heavy C-style memory management.
- Is
Thrust code Thrust on the GPU? How can i say - How did
Thrust eliminate the weird kernel invocation syntax? - Does
Thrust really cause a kernel? - Does
Thrust automatically calculate the thread index?
Thank you for your time. Sorry if these are stupid questions, but I find it incredulous that the examples I saw instantly go from what can be described as a model T to M3.