The first thing to do about PTX is that it is only an intermediate representation of the code that runs on the GPU, the assembly language of the virtual machine. PTX is built for the target machine code either in ptxas at compile time or at runtime by the driver. Therefore, when you look at PTX, you look at what the compiler emits, but not at what the GPU actually works. You can also write your own PTX code either from scratch (this is the only JIT compilation model supported in CUDA), or as part of the inline assembler sections in CUDA C code (the latter is officially supported with CUDA 4.0, but "unofficially" is supported much longer than this is). CUDA always comes with a complete guide to the PTX language with tools and is fully documented. The ocelot project used this documentation to implement its own PTX cross-compiler, which allows CUDA code to run on other hardware initially, originally x86 processors, but more recently, AMD GPUs.
If you want to see what the GPU actually works (as opposed to what the compiler emits), NVIDIA now provides a binary disassembler tool called cudaobjdump that can display the actual machine code segments in the code compiled for the Fermi GPU. There was an older , an unofficial tool called decuda that worked on the G80 and G90 GPUs.
Having said that, you can learn a lot from the output of PTX, in particular, about how the compiler applies optimization and what instructions it emits to implement certain C-solutions. Each version of the NVIDIA CUDA Toolkit comes with an nvcc and PTX manual . There is a lot of information in both documents to learn how to compile CUDA C / C ++ kernel code for PTX, and to understand what PTX instructions will do.
source share