CUDA: Why is it impossible to define static global member functions?

When compiling the code below using nvcc (CUDA 5.0), the error "illegal combination of memory qualifiers" appears because it seems impossible to have global kernels in the class.

class A { public: __global__ static void kernel(); }; __global__ void A::kernel() {} 

I can understand this limitation when dealing with non-static members, but why does the error still occur when the kernel is declared static? A call to such members will not differ from a function call if it is declared in the namespace ( A in this case).

 A::kernel <<< 1, 1 >>> (); 

Is there a reason why I can’t understand why this is not yet implemented?

EDIT: Based on the answers in the answers and comments, I was not clear enough about my question. My question is not why the error appears. Obviously, this is because it is not implemented. My question is why it was not implemented. Until now, I have not been able to come up with reasons why this function will not be implemented. I understand that maybe I forgot about a special case that would complicate matters, so the question.

I believe this is a reasonable function:

  • A static function does not have a this pointer. So even if the kernel is called on an object that lives on the host, there is no conflict in accessing its data, since this data is primarily unavailable (data from which object?).
  • You can argue that if a class has static data associated with it that lives on the host, this should in principle be accessible from the static kernel. However, static data is also not supported, so it does not conflict again.
  • Calling a static kernel for an object on the host ( A a; a.staticKernel<<<...,...>>>();) will be completely equivalent to calling it without an object at all ( A::staticKernel<<<...,...>>>(); ) since we are used to regular C ++.

What am I missing?

+6
source share
1 answer

Fortunately, 4 years after this question was asked, clang 4.0 can compile the CUDA language . Consider this example:

 class A { public: __global__ static void kernel(); }; __device__ void A::kernel() {} int main() { A::kernel <<< 1, 1 >>> (); }; 

When I try to compile it with clang 4.0, I get the following error:

 test.cu:7:1: error: kernel function 'kernel' must be a free function or static member function __global__ void A::kernel() ^ /usr/local/cuda/include/host_defines.h:191:9: note: expanded from macro '__global__' __location__(global) ^ /usr/local/cuda/include/host_defines.h:88:9: note: expanded from macro '__location__' __annotate__(a) ^ /usr/local/cuda/include/host_defines.h:86:9: note: expanded from macro '__annotate__' __attribute__((a)) ^ test.cu:7:20: error: __host__ function 'kernel' cannot overload __global__ function 'kernel' __global__ void A::kernel() ^ test.cu:4:28: note: previous declaration is here __global__ static void kernel(); ^ 2 errors generated. 

To satisfy these errors, I introduced a kernel definition in the class declaration:

 class A { public: __global__ static void kernel() { // implementation would go here } }; 

Then clang 4.0 compiles it successfully and can be executed without any errors. Thus, this is clearly not a limitation of the CUDA language, but its actual standard compiler. By the way, nvcc has many similar unjustified restrictions that clang does not have.

+1
source

Source: https://habr.com/ru/post/952895/


All Articles