Should I look for PTX to optimize my kernel? If so, how?

Question

Should I look for PTX to optimize my kernel? If so, how?

Do you recommend reading the PTX kernel code to find out how to improve your kernels?

One example: I read what you can learn from the PTX code if automatic loopback was working. If this is not the case, you would need to manually loop through the kernel code.

Are there any other use cases for PTX code?
Are you looking at your PTX code?
Where can I find out how to read the PTX code that CUDA generates for my cores?

+6

performance gpgpu cuda ptx loop-unrolling

Framester Nov 10 '11 at 14:22

source share

1 answer

talonmies · Accepted Answer · 2011-11-10T15:12:17+0000

The first thing to do about PTX is that it is only an intermediate representation of the code that runs on the GPU, the assembly language of the virtual machine. PTX is built for the target machine code either in ptxas at compile time or at runtime by the driver. Therefore, when you look at PTX, you look at what the compiler emits, but not at what the GPU actually works. You can also write your own PTX code either from scratch (this is the only JIT compilation model supported in CUDA), or as part of the inline assembler sections in CUDA C code (the latter is officially supported with CUDA 4.0, but "unofficially" is supported much longer than this is). CUDA always comes with a complete guide to the PTX language with tools and is fully documented. The ocelot project used this documentation to implement its own PTX cross-compiler, which allows CUDA code to run on other hardware initially, originally x86 processors, but more recently, AMD GPUs.

If you want to see what the GPU actually works (as opposed to what the compiler emits), NVIDIA now provides a binary disassembler tool called cudaobjdump that can display the actual machine code segments in the code compiled for the Fermi GPU. There was an older , an unofficial tool called decuda that worked on the G80 and G90 GPUs.

Having said that, you can learn a lot from the output of PTX, in particular, about how the compiler applies optimization and what instructions it emits to implement certain C-solutions. Each version of the NVIDIA CUDA Toolkit comes with an nvcc and PTX manual . There is a lot of information in both documents to learn how to compile CUDA C / C ++ kernel code for PTX, and to understand what PTX instructions will do.

Should I look for PTX to optimize my kernel? If so, how?

More articles: