CUDA C ++ Templating of Kernel Parameter

I am trying to templatize a CUDA kernel based on a boolean variable (as shown here: Should I combine two similar kernels with an if statement, at the risk of performance loss? ), But I keep getting a compiler error that says my function is not a template. I think I just missed something obvious, so this is pretty unpleasant.

The following does not work:

util.cuh

#include "kernels.cuh" //Utility functions 

kernels.cuh

  #ifndef KERNELS #define KERNELS template<bool approx> __global__ void kernel(...params...); #endif 

kernels.cu

 template<bool approx> __global__ void kernel(...params...) { if(approx) { //Approximate calculation } else { //Exact calculation } } template __global__ void kernel<false>(...params...); //Error occurs here 

main.cu

 #include "kernels.cuh" kernel<false><<<dimGrid,dimBlock>>>(...params...); 

The following work:

util.cuh

 #include "kernels.cuh" //Utility functions 

kernels.cuh

 #ifndef KERNELS #define KERNELS template<bool approx> __global__ void kernel(...params...); template<bool approx> __global__ void kernel(...params...) { if(approx) { //Approximate calculation } else { //Exact calculation } } #endif 

main.cu

 #include "kernels.cuh" kernel<false><<<dimGrid,dimBlock>>>(...params...); 

If i close

 template __global__ void kernel<false>(...params...); 

The line at the end of kernels.cuh also works.

I get the following errors (both refer to the marked line above):

 kernel is not a template invalid explicit instantiation declaration 

If that matters, I will compile all my .cu files on one line, for example:

 nvcc -O3 -arch=sm_21 -I. main.cu kernels.cu -o program 
+6
source share
1 answer

All explicit specialization declarations must be visible when instantiating the template. Your explicit specialization declaration appears only in the kernals.cu translation block, but not in main.cu.

The following code does work correctly (in addition to adding the __global__ qualifier in the explicit instantiation instructions).

 #include<cuda.h> #include<cuda_runtime.h> #include<stdio.h> #include<conio.h> template<bool approx> __global__ void kernel() { if(approx) { printf("True branch\n"); } else { printf("False branch\n"); } } template __global__ void kernel<false>(); int main(void) { kernel<false><<<1,1>>>(); getch(); return 0; } 

EDIT

In C ++, template functions are not compiled until an explicit function instantiation is encountered. From this point of view, CUDA, which now fully supports templates, behaves exactly like C ++.

To make a concrete example, when the compiler finds something like

 template<class T> __global__ void kernel(...params...) { ... T a; ... } 

it just checks the syntax of the function, but does not create object code. So, if you compile a file with one template function, as described above, you will have an "empty" object file. This is reasonable since the compiler did not know what type is assigned to a .

The compiler only creates object code when it detects the explicit creation of a function template. At this point, how compilation of template functions works, and this behavior introduces a restriction for projects with several files: the implementation (definition) of the template function must be in the same file as its declaration. Thus, you cannot separate the interface contained in kernels.cuh in a header file separate from kernels.cu , which is the main reason why the first version of your code does not compile. Accordingly, you must include both the interface and the implementation in any file that uses templates, namely: you must include both kernels.cuh and kernels.cu

Since no code is generated without explicitly creating an instance, compilers allow the inclusion of more than once from the same template file with declarations and definitions in the project without binding errors.

There are several guides for using templates in C ++. The C ++ Template Guide for Idiots - Part 1 , in addition to the annoying name, will provide you with a step-by-step introduction to this topic.

+12
source

Source: https://habr.com/ru/post/957695/


All Articles