CUDA Fast Custom Comparison Operator

I am evaluating CUDA and am currently using the Thrust library to sort numbers.

I would like to create my own comparator for thrust :: sort, but it slows down dramatically! I created my own less implementation by simply copying the code from functional.h . However, it seems that it compiles in some other way and works very slowly.

  • default comper: thrust :: less () - 94 ms
  • my own comparator: less () - 906 ms

I am using Visual Studio 2010. What should I do to get the same performance as in option 1?

Full code:

#include <stdio.h> #include <cuda.h> #include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <thrust/generate.h> #include <thrust/sort.h> int myRand() { static int counter = 0; if ( counter++ % 10000 == 0 ) srand(time(NULL)+counter); return (rand()<<16) | rand(); } template<typename T> struct less : public thrust::binary_function<T,T,bool> { __host__ __device__ bool operator()(const T &lhs, const T &rhs) const { return lhs < rhs; } }; int main() { thrust::host_vector<int> h_vec(10 * 1000 * 1000); thrust::generate(h_vec.begin(), h_vec.end(), myRand); thrust::device_vector<int> d_vec = h_vec; int clc = clock(); thrust::sort(d_vec.begin(), d_vec.end(), less<int>()); printf("%dms\n", (clock()-clc) * 1000 / CLOCKS_PER_SEC); return 0; } 
+4
source share
1 answer

The reason you see performance differences is because Thrust implements sorting with different algorithms depending on the arguments provided by thrust::sort .

In case 1. Thrust can prove that sorting can be implemented in linear time using radix sorting. This is because the data sort type is a built-in numeric type ( int ), and the comparison function is built-in less than an operation. Thrust recognizes that thrust::less<int> will give an equivalent result as x < y .

In case 2. Thrust knows nothing about your custom less<int> and should use a more conservative sort-based algorithm that has different asymptotic complexity, although in truth, your less<int> equivalent to thrust::less<int> .

In general, custom comparison operators cannot be used with stricter, faster sorts that control the binary representation of data, such as radix sort. In these cases, Thrust returns to a more general, but slower sorting.

+6
source

Source: https://habr.com/ru/post/1265083/


All Articles