Good question! You cannot use the __constant__ array as if it were a regular device pointer.
I will answer your question (after the line below), but first: this is a bad use of __constant__ , and this is not quite what you want. CUDA's persistent cache is optimized for uniform access to threads at the core. This means that all threads in the core have the same location at the same time. If each warp thread accesses a different permanent memory location, then the calls become serialized. Thus, your access pattern, where sequential threads will access sequential memory cells, will be 32 times slower than regular access. You should just use the device memory. If you need to write data once, but read it many times, then simply use device_vector: initialize it once, and then read it many times.
To do what you requested, you can use thrust::counting_iterator as an input to thrust::transform to create a range of indices in your __constant__ array. Then your operator() functor accepts the operand of the int index, not the operand of the float value, and searches in read-only memory.
(Note that this means that your functor is now only in __device__ code. You can easily overload the statement to accept a float and call it differently if you need portability.)
I modified your example to initialize the data and print the result to make sure it is correct.
#include <stdio.h> #include <stdlib.h> #include <thrust/device_vector.h> #include <thrust/host_vector.h> #include <thrust/iterator/counting_iterator.h> const int n = 32; __constant__ float dev_x[n]; //the array in question struct struct_max : public thrust::unary_function<float,float> { float C; struct_max(float _C) : C(_C) {} // only works as a device function __device__ float operator()(const int& i) const { // use index into constant array return fmax(dev_x[i],C); } }; void foo(const thrust::host_vector<float> &input_host_x, const float &x0) { thrust::device_vector<float> dev_sol(n); thrust::host_vector<float> host_sol(n); thrust::device_ptr<float> dev_ptr = thrust::device_pointer_cast(dev_x); thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(n), dev_sol.begin(), struct_max(x0)); host_sol = dev_sol; //this line crashes for (int i = 0; i < n; i++) printf("%f\n", host_sol[i]); } int main() { thrust::host_vector<float> x(n); //magic happens populate x for (int i = 0; i < n; i++) x[i] = rand() / (float)RAND_MAX; cudaMemcpyToSymbol(dev_x,x.data(),n*sizeof(float)); foo(x, 0.5); return(0); }
source share