CUDA host and device using the same __constant__ memory

I have a device / host function that uses read-only memory. It works fine on the device, but on the host it seems that this memory remains uninitialized.

#include <iostream> #include <stdio.h> const __constant__ double vals[2] = { 0.0, 1000.0 }; __device__ __host__ double f(size_t i) { return vals[i]; } __global__ void kern() { printf("vals[%d] = %lf\n", threadIdx.x, vals[threadIdx.x]); } int main() { std::cerr << f(0) << " " << f(1) << std::endl; kern<<<1, 2>>>(); cudaThreadSynchronize(); } 

Fingerprints (CC 2.0 or higher required)

 0 0 vals[0] = 0.000000 vals[1] = 1000.000000 

What is the problem and how can I initialize both the device and host memory models at the same time?

+6
source share
3 answers

Since CygnusX1 misunderstood what I had in mind in my commentary on MurphEngineer, perhaps I should post my own answer. I had in mind the following:

 __constant__ double dc_vals[2] = { 0.0, 1000.0 }; const double hc_vals[2] = { 0.0, 1000.0 }; __device__ __host__ double f(size_t i) { #ifdef __CUDA_ARCH__ return dc_vals[i]; #else return hc_vals[i]; #endif } 

This has the same result as Cygnus', but it is more flexible than real code: it allows you, for example, to set specific values ​​in your constant arrays and allows you to use CUDA API functions, such as cudaMemcpyToSymbol / cudsaMemcpyFromSymbol in the __constant__ array.

A more realistic complete example:

 #include <iostream> #include <stdio.h> __constant__ double dc_vals[2]; const double hc_vals[2]; __device__ __host__ double f(size_t i) { #ifdef __CUDA_ARCH__ return dc_vals[i]; #else return hc_vals[i]; #endif } __global__ void kern() { printf("vals[%d] = %lf\n", threadIdx.x, vals[threadIdx.x]); } int main() { hc_vals[0] = 0.0; hc_vals[1] = 1000.0; cudaMemcpyToSymbol(dc_vals, hc_vals, 2 * sizeof(double), 0, cudaMemcpyHostToDevice); std::cerr << f(0) << " " << f(1) << std::endl; kern<<<1, 2>>>(); cudaThreadSynchronize(); } 
+11
source

Using the __constant__ specifier explicitly allocates this memory on the device. It is not possible to access this memory from the host - even with the new CUDA Unified Addressing element (which only works for the memory allocated by cudaMalloc () and her friends). Qualifying a variable with a constant, she says: "This is a constant pointer to (...)".

The right way to do this is to really have two arrays: one on the host and one on the device. Initialize your host array, then use cudaMemcpyToSymbol () to copy the data to the device array at runtime. For more information on how to do this, see this topic: http://forums.nvidia.com/index.php?showtopic=69724

+4
source

I think MurphEngineer explained why it is not working.

To quickly fix this problem, you can follow the idea of ​​the design, something like this:

 #ifdef __CUDA_ARCH__ #define CONSTANT __constant__ #else #define CONSTANT #endif const CONSTANT double vals[2] = { 0.0, 1000.0 }; 

Thus, compiling the host will create a regular const const array, and compiling the device will create the __constant__ device __constant__ .

Note that with this trick it can be harder to use the CUDA API to access this array of devices with functions like cudaMemcpyToSymbol() if you ever decide to do this.

+2
source

Source: https://habr.com/ru/post/909414/


All Articles