Nvcc: Combine External and Permanent

I would like to organize my CUDA code into separate object files that will be linked at the end of compilation, as in C ++. To do this, I would like to declare an extern pointer to __constant__ memory in the header file and put the definition in one of the .cu files, also following the pattern from C ++. But it seems that when I do this, nvcc ignores "extern" - it takes each declaration as a definition. Is there any way around this?

To be more specific regarding code and errors, I have this in the header file:

 extern __device__ void* device_function_table[]; 

and then in the .cu file:

 void* __device__ device_function_table[200]; 

which gives this error when compiling:

 (path).cu:40: error: redefinition of 'void* device_function_table [200]' (path).hh:29: error: 'void* device_function_table [200]' previously declared here 

My current solution is to use the magic of Makefile to merge all my .cu files and, in fact, one large translation unit, but some semblance of file organization. But this is already slowing down, compiling noticeably, since a change to any of my classes means recompiling all of them; and I expect to add some more classes.

Edit: I see that I put __constant__ in the text and __device__ in the example; the question concerns both.

+4
source share
3 answers

From CUDA C Programming Guide Version 4.0, Section D.2.1.1:

Departments __device__ , __shared__ and __constant__ not allowed on:

  • members of a class, structure, and association,
  • formal parameters
  • local variables inside the function that runs on the host.

__shared__ and __constant__ variables implied static storage.

__device__ and __constant__ variables are allowed only in the file area.

__device__ , __shared__ and __constant__ variables cannot be defined as external using the extern keyword. The only exception is for dynamically allocated __shared__ variables, as described in Section B.2.3.

+3
source

Starting with CUDA 5.0, it is now possible to have external data with CUDA if separate compilation and binding is enabled. This blog post explains this: http://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/

If this is done, it just uses it, as in the original message, and it just works.

+3
source

To shorten the long story, with the recent CUDA toolkit (I'm on v8) and at least 2.0 computing power, in Visual Studio, go to Project Properties β†’ CUDA C / C ++ β†’ Common, find the β€œGenerate Relocatable Device Code” in the list , set it to "Yes (-rdc = true)".

For the command line, this page offers the –dc compiler option

+1
source

Source: https://habr.com/ru/post/1379018/


All Articles