Due to some performance issues with Thrust libraries (see this page for more information), I plan on re-factoring the CUDA application to use CUB instead of Persistent. In particular, to replace the calls to thrust :: sort_by_key and thrust :: inclusive_scan). In particular, in my application, I need to sort 3 arrays by key. Here's how I did it with traction:
thrust::sort_by_key(key_iter, key_iter + numKeys, indices); thrust::gather_wrapper(indices, indices + numKeys, thrust::make_zip_iterator(thrust::make_tuple(values1Ptr, values2Ptr, values3Ptr)), thrust::make_zip_iterator(thrust::make_tuple(valuesOut1Ptr, valuesOut2Ptr, valuesOut3Ptr)) );
Where
key iter is thrust :: device_ptr, which points to the keys I want to sort,indices indicate the sequence (from 0 to numKeys-1) in the device memoryvalues{1,2,3}Ptr are device_ptrs for the values ββI want to sortvalues{1,2,3}OutPtr are device_ptrs for sorted values
Using the CUB SortPairs function, I can sort the buffer with one value, but not all 3 in one shot. The problem is that I donβt see any CUB gather-like utilities. Suggestions?
EDIT:
I suppose I can implement my own native collection core, but is there a better way to do this otherwise than:
template <typename Index, typename Value> __global__ void gather_kernel(const unsigned int N, const Index * map, const Value * src, Value * dst) { unsigned int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) { dst[i] = src[map[i]]; } }
Not coalesced loads and storages make me a succession, but this is probably inevitable without a known map structure.
source share