Miscellaneous and cross-channel communication instructions in CUDA

I play with the NVIDIA profiler (nvprof), and there are two specific metrics that I don’t understand:

inst_inter_thread_communication Number of inter-thread communication instructions executed by non-predicated threads inst_misc Number of miscellaneous instructions executed by non-predicated threads 

I'm just wondering what instructions will be instructions for exchanging between threads and which instructions will be different.

Link: http://docs.nvidia.com/cuda/profiler-users-guide/#metrics-reference

+5
source share
1 answer

SASS instructions, which fall into two categories, are as follows:

inst_inter_thread_communication

  • SHFL
  • VOTE

inst_misc

  • Nop
  • S2R, B2R, R2B, P2R
  • Lecp
  • CSET [P], PSET [P]
  • Mov
  • SEL
  • PRMT
  • Maxwell only (BAR, DEPBAR)
  • There are a few rare undocumented instructions that expand this category.

The CUDA Binary Utilities document in the Instructions Configuration Reference contains a brief description of the SASS instruction. There is a 1: 1 ratio between SASS and PTX, so you can also view the PTX ISA manual.

+6
source

Source: https://habr.com/ru/post/1201840/


All Articles