How can I get the nvcc CUDA compiler to optimize more?

When using the C or C ++ compiler, if we pass a switch -O3, execution becomes faster. Is there something equivalent in CUDA?

I am compiling my code with the command nvcc filename.cu. After that, do it ./a.out.

+4
source share
2 answers

warning: compiling with nvcc -O3 filename.cuwill pass the -O3 option only for host code.

To optimize the CUDA kernel code, you must pass the optimization flags to the PTX compiler, for example:

nvcc -Xptxas -O3,-v filename.cu

3 cuda ( ), -v , , ( ).

, nvcc-, - -use_fast_math, (. GPU ).

, , . , :

  • Parallelism (ILP): CUDA - . , , NxN, TLP 2, NxM- ( M = N/2), threadIdx.y .
  • : ​​ -maxrrregcount=N. , ( , ).
  • : #pragma unroll N , , CUDA. N 2,3,4. , . ILP, .
  • : , float A[N],B[N], float2 AB[N]. / .

, , , , . nVIDIA, .

+7

nvcc , C-++ CPU. nvcc; nvcc --help, (, nvcc --help | less, ).

-O3 ( -G, , ). -O0 -O1 .., .

, , , CPU, ptxas.

, nvcc -o foo filename.cu, foo, a.out, . , C/++.

+6

Source: https://habr.com/ru/post/1676023/


All Articles