CUDA C v. Thrust, am I missing something?

I just started to learn CUDA programming. I made my way through some simple CUDA C examples and everything went smoothly. Then! All of a sudden! Persistent! I believe that I understand C ++ functions and was stunned by the difference between CUDA C and Thrust

I find it hard to believe that

 __global__ void square(float *a, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < N) { a[idx] = a[idx] * a[idx]; } } int main(int argc, char** argv) { float *aHost, *aDevice; const int N = 10; size_t size = N * sizeof(float); aHost = (float*)malloc(size); cudaMalloc((void**)&aDevice, size); for (int i = 0; i < N; i++) { aHost[i] = (float)i; } cudaMemcpy(aDevice, aHost, size, cudaMemcpyHostToDevice); int block = 4; int nBlock = N/block + (N % block == 0 ? 0:1); square<<<nBlock, block>>>(aDevice, N); cudaMemcpy(aHost, aDevice, size, cudaMemcpyDeviceToHost); for (int i = 0; i < N; i++) { printf("%d, %f\n", i, aHost[i]); } free(aHost); cudaFree(aDevice); } 

is equal

 template <typename T> struct square { __host__ __device__ T operator()(const T& x) const { return x * x; } }; int main(int argc, char** argv) { const int N = 10; thrust::device_vector<float> dVec(N); thrust::sequence(dVec.begin(), dVec.end()); thrust::transform(dVec.begin(), dVec.end(), dVec.begin(), square<float>()); thrust::copy(dVec.begin(), dVec.end(), std::ostream_iterator<float>(std::cout, "\n")); } 

Am I missing something? Is the above code running on the GPU? Thrust is a great tool, but I'm skeptical that it takes care of all the heavy C-style memory management.

  • Is Thrust code Thrust on the GPU? How can i say
  • How did Thrust eliminate the weird kernel invocation syntax?
  • Does Thrust really cause a kernel?
  • Does Thrust automatically calculate the thread index?

Thank you for your time. Sorry if these are stupid questions, but I find it incredulous that the examples I saw instantly go from what can be described as a model T to M3.

+4
source share
1 answer

Rude: yes, of course. Thrust is a library, so they are all born to make it easier. Its a wonderful point is to avoid all explicit CUDA codes that look strange to other programmers, providing a friendly C ++ interface.

Thrust uses a GPU, but not just a GPU. It performs the same actions as you if you write your own code, that is, C / C ++ code to allocate memory, copy, set grid and block sizes ... and then call the GPU to execute the kernel.

This is a good choice for those who do not want to penetrate low-level CUDA, but use the parallelism GPU in a simple (but frequent) problem, for example, in vector operations.

+6
source

Source: https://habr.com/ru/post/1483788/


All Articles