How can I get the nvcc CUDA compiler to optimize more?

Question

When using the C or C ++ compiler, if we pass a switch -O3, execution becomes faster. Is there something equivalent in CUDA?

I am compiling my code with the command nvcc filename.cu. After that, do it ./a.out.

+4

user12290 Apr 30 '17 at 13:10

2 answers

nvcc , C-++ CPU. nvcc; nvcc --help, (, nvcc --help | less, ).

-O3 ( -G, , ). -O0 -O1 .., .

, , , CPU, ptxas.

, nvcc -o foo filename.cu, foo, a.out, . , C/++.

+6

Luca Ferraro · Accepted Answer · 2017-05-08T11:02:27+0000

warning: compiling with nvcc -O3 filename.cuwill pass the -O3 option only for host code.

To optimize the CUDA kernel code, you must pass the optimization flags to the PTX compiler, for example:

nvcc -Xptxas -O3,-v filename.cu

3 cuda ( ), -v , , ( ).

, nvcc-, - -use_fast_math, (. GPU ).

, , . , :

, , , , . nVIDIA, .