First, make sure you specify CLION to handle .cu and .cuh as C ++ using the File Types settings menu.
CLion cannot parse CUDA language extensions, but it provides a preprocessor macro that is defined only when clion parses the code. You can use this to implement almost full CUDA support yourself.
Most of the problem is that the CLION parser is reset using keywords such as __host__ or __device__ , as a result of which it cannot do things that otherwise know how to do it: 
In this example, CLion did not understand Dtype because the CUDA stuff confused its parsing.
The smallest solution to this problem is to let the clion preprocessor macros ignore the new keywords, fixing the worst of the broken:
#ifdef __JETBRAINS_IDE__ #define __host__ #define __device__ #define __shared__ #define __constant__ #define __global__ #endif
This fixes the above example:

However, CUDA functions such as __syncthreads , __popc will still not be indexed. So it will be CUDA, like threadIdx . One option is to provide endless preprocessor macros (or even structure descriptions) for them, but it is ugly and sacrifices type safety.
If you use the Clang CUDA interface, you can do better. Clang implements implicitly defined inline CUDAs by defining them in headers, which are then included when compiling your code. They provide definitions of things like threadIdx . Pretending to be a CUDA compiler preprocessor and including device_functions.h , we can get __popc and friends to work too:
#ifdef __JETBRAINS_IDE__ #define __host__ #define __device__ #define __shared__ #define __constant__ #define __global__
This will give you perfect indexing of almost all CUDA code. CLion even gracefully handles the syntax <<<...>>> . It places a small red line under one character at each end of the launch block, but otherwise considers it as a function call - this is completely normal:
