How do GPUs handle random access?

I read several guides on how to implement raytracer in opengl 4.3 compute shaders, and it made me think about something that distorted me for a while. How exactly do GPUs handle the huge amount of random access reading needed to implement something like this? Does each stream processor get its own copy of the data? It seems that the system will be very overloaded with memory access, but this is just my own, probably incorrect intuition.

+5
source share
1 answer

Stream Multiprocessors (SM) processors have caches, but they are relatively small and will not help with truly random access.

Instead, one of the ideas underlying GPUs is to mask the delay in access to memory: each SM is assigned multiple threads to execute, more than it has kernels. In each free hour mode, he plans some threads that are not blocked when accessing memory. When the data needed for the stream is not in the SM cache, then the stream stops until the data arrives and another stream is selected for execution.

Note that the working assumption is that you are doing heavy calculations. If all you do is just a little computation on a lot of data, for example. just adding up a lot of 32-bit floats, it is very likely that the bottleneck will be in the memory bus bandwidth, and most of the time your threads will be stopped waiting for their bits to arrive.

In practice, although you do some heavy data calculations. For instance. you get input normals and material parameters, and then do a big calculation of lighting on them. Here, while some threads are performing calculations, others are waiting for their data to arrive.

+2
source

Source: https://habr.com/ru/post/1262130/


All Articles