Since CygnusX1 misunderstood what I had in mind in my commentary on MurphEngineer, perhaps I should post my own answer. I had in mind the following:
__constant__ double dc_vals[2] = { 0.0, 1000.0 }; const double hc_vals[2] = { 0.0, 1000.0 }; __device__ __host__ double f(size_t i) { #ifdef __CUDA_ARCH__ return dc_vals[i]; #else return hc_vals[i]; #endif }
This has the same result as Cygnus', but it is more flexible than real code: it allows you, for example, to set specific values ββin your constant arrays and allows you to use CUDA API functions, such as cudaMemcpyToSymbol
/ cudsaMemcpyFromSymbol
in the __constant__
array.
A more realistic complete example:
#include <iostream> #include <stdio.h> __constant__ double dc_vals[2]; const double hc_vals[2]; __device__ __host__ double f(size_t i) { #ifdef __CUDA_ARCH__ return dc_vals[i]; #else return hc_vals[i]; #endif } __global__ void kern() { printf("vals[%d] = %lf\n", threadIdx.x, vals[threadIdx.x]); } int main() { hc_vals[0] = 0.0; hc_vals[1] = 1000.0; cudaMemcpyToSymbol(dc_vals, hc_vals, 2 * sizeof(double), 0, cudaMemcpyHostToDevice); std::cerr << f(0) << " " << f(1) << std::endl; kern<<<1, 2>>>(); cudaThreadSynchronize(); }
source share