What happens faster in CUDA: read-only memory or texture memory?

I know that both are on over-the-counter DRAM and are cached.

But what is faster in access speed? Or in what circumstances is one faster than the other?

+6
source share
3 answers

Texture memory is optimized for 2D spatial locality (where does it get its name from). You can think of permanent memory as the benefits of temperature locality.

The advantages of texture memory over permanent memory can be summarized as follows:

  • Spatial locality
  • Addressing calculations can be calculated outside the kernel in hardware
  • Data can be accessed by different variables in one operation.
  • 8-bit and 16-bit data can be automatically converted to floating point numbers between 0 and 1.0

See more details.

+9
source

Read-only memory is optimized for broadcast, that is, when the threads in the core all read the same memory location. If they read different locations, they will work, but every other place that the warp refers to is worth more time. When reading is streamed, read-only memory is MUCH faster than texture memory.

Texture memory has a high latency, even for cache hits. You can think of it as a bandwidth aggregator - if reuse can be done from the texture cache, then for these readings the GPU should not go to external memory. For 2D and 3D textures, addressing has two-dimensional and three-dimensional locality; therefore, the cache line fills 2D and 3D memory blocks instead of lines.

Finally, the texture pipeline can perform “bonus” calculations: processing boundary conditions (“texture addressing”) and converting 8- and 16-bit values ​​to a unified float are examples of operations that can be performed “for free”. (they are part of the reason why reading texture has a high delay)

+8
source

In my experience, access to texture memory is as fast as access to read-only memory. But texture memory is much larger than read-only memory, so if you need to store a large chunk of data, I recommend reusing texture memory instead of memcpy. In addition, if you need interpolation, texture sampling will have the optimal selection.

Permanent memory, on the other hand, is hardware optimized for the case where all warp threads read the same location. If streams are read from multiple locations, access is serialized.

0
source

Source: https://habr.com/ru/post/920471/


All Articles