Now only the Tesla V100 and Titan V have tensor kernels. Both GPUs have 5120 cuda cores, where each core can perform up to 1 single multiple pumping operation (for example, in fp32 format: x + = y * z) per GPU clock cycle (for example, the Tesla V100 PCIe frequency is 1, 38 GHz).
ββ 4x4. ββ 1 1 . fp16 4x4 fp32 (: 4x4) ( fp32 4x4).
, fp16, fp32.
, 4x4, NVIDIA " ".