Ray tracing through Compute Shader vs Screen Quad

I recently searched for ray tracing through opengl tutorials. Most tutorials prefer to calculate shaders. I wonder why they do not just display the texture, and then display the texture as a square.

What are the advantages and disadvantages of a shader calculation method over a square screen?

+5
source share
2 answers

Short answer: because computational shaders give you more efficient tools for performing complex computations.

Long answer:

Perhaps the biggest advantage they provide (in the case of tracing) is the ability to precisely control how the GPU works. This is important when you are viewing a complex scene. If your scene is trivial (e.g. Cornell Box), then the difference is not significant. Trace some areas throughout your fragment shader throughout the day. Check out http://shadertoy.com/ to witness the frenzy that can be achieved with modern GPUs and flash shaders.

But. If your scene and shading are quite complex, you need to control how the work is done. Atv rendering and tracing in a flash shader, at best, will cause your application to freeze while the driver cries, changes his legal name and moves to the other side of the world ... and, in the worst case, the Driver. Many drivers will be aborted if one operation takes too much time (which almost never happens with standard use, but will happen very quickly when you start tracing 1M poly scenes).

So, you are doing too much work in the flash shader ... next, is it logical? Well, limit your workload. Draw smaller boxes to control how much screen you are viewing right away. Or use glScissor. Make the workload smaller and smaller until your driver can handle it.

Guess what we just invented? Calculate shaders working groups ! Workgroups are a shader computational mechanism for controlling the size of a job, and they are much better abstractions for this than hacking at the fragment level (when we deal with such a difficult task). Now we can very precisely control how many rays we send, and we can do this without being closely connected with the screen space. For a simple indicator, this adds unnecessary complexity. For โ€œrealโ€, this means that we can easily perform sub-pixel raycasting on a shaky grid for AA, a huge amount of raikast per pixel to track the path if we want, etc.

Other features of computational shaders that are useful for performers, industry indicators:

  • Shared memory between groups of streams (allows, for example, packet tracing, in which the entire package of spatially coherent rays is monitored at the same time in order to use memory consistency and the ability to interact with neighboring rays)
  • Scatter Writes allow you to calculate shaders for recording at arbitrary locations of images (note: the image and texture differ in a subtle way, but the advantage remains relevant); You no longer need to trace directly from a known pixel location.

In general, the architecture of modern graphics processors is designed to more naturally use this type of task using computation. Personally, I wrote a real-time progressive path tracer using MLT, kd-tree acceleration and a number of other expensive computational methods (PT is already very expensive). I tried to stay in frame with fragments / full-screen quad-core processor as much as I could. Once my scene was complex enough to require an acceleration structure, my driver began to choke no matter what hackers I pulled out. I re-implemented in CUDA (not quite the same as computed, but used the same basic architectural achievements of the GPU), and everything was fine with the world.

If you really want to dig, take a look at section 3.1 here: https://graphics.cg.uni-saarland.de/fileadmin/cguds/papers/2007/guenther_07_BVHonGPU/Guenter_et_al._-_Realtime_Ray_Tracing_on_GPU_with_BVH-based_Pack . Honestly, the best answer to this question would be an extensive discussion of the GPU microarchitecture, and I'm not at all ready to give it. By looking at modern GPU trace documents like the ones above, you get an idea of โ€‹โ€‹how deep the performance considerations are.

One final note: any advantage of computing a calculation over a fragment in the context of raytracing a complex scene has nothing to do with the overhead / adjacent rasterization / vertex shader, etc. . For a complex scene with complex shading, the bottlenecks are completely in the trace calculations, which, as discussed, shaders calculate, have tools for more efficient implementation.

+7
source

I'm going to fill out Josh Parnell's info.

One problem with both the fragment shader and the computational shader is that both of them have no recursion.

A ray tracer is recursive in nature (yes, I know that you can always convert a recursive algorithm to a non-recursive one, but it is not always easy to do).

So another way to see the problem could be as follows:

Instead of having โ€œone streamโ€ per pixel, one idea would be to have one stream per path (the path is part of your beam (between two bounces)).

Going this way, you send your "bunch" of rays instead of your "pixel grid". This simplifies the potential recursion of the ray indicator and avoids discrepancies in complex materials:

More information here: http://research.nvidia.com/publication/megakernels-considered-harmful-wavefront-path-tracing-gpus

+2
source

Source: https://habr.com/ru/post/1268234/


All Articles