Read only memory:
Kernel constants and arguments are stored here.
Slow, but with a cache (8 kb)
Permanent memory optimized for broadcast
Texture memory:
Cache optimized for 2D spatial access pattern
Reading has some advantages, such as address modes and interpolation, that can be used at no extra cost.
Global memory:
Slow and unencrypted (1.0), cached (2.0)
Requires sequential and aligned 16-byte read and write for fast (combined read / write)
Source: http://www.cvg.ethz.ch/teaching/2011spring/gpgpu/cuda_memory.pdf
source share