What part of the modern graphics pipeline uses dedicated hardware?

Question

What part of the modern graphics pipeline uses dedicated hardware?

To raise the question in another way, do I need to try to re-implement OpenGL or DirectX (or an analogue) using GPGPU (CUDA, OpenCL), where and why would it be slower than selling stocks on NVIDIA and AMD cards?

I can see how vertex / fragment / geometry / tessellation shaders can be made beautiful and fast using GPGPUs, but what about how to create a list of fragments that you need to render, crop, select textures and so on?

I ask for a purely academic interest.

+6

gpgpu opengl hardware-acceleration

Daedalusfall Oct 30 '11 at 13:10

source share

3 answers

interesting link to the source code: http://code.google.com/p/cudaraster/

and related research document: http://research.nvidia.com/sites/default/files/publications/laine2011hpg_paper.pdf

Some researchers from Nvidia have tried to implement and accurately determine what was asked in this article: "An open source implementation of" high-performance rasterization of software on GPUs "" ...

And this is an open source for "purely academic interests": it is a limited set of Opengl, mainly for benchmarking rasterization of triangles.

+1

Arnaud nauwynck Jan 6 '14 at 22:40

source share

To pose the question in a different way, you need to try and redefine OpenGL or DirectX (or an analog) using GPGPU (CUDA, OpenCL)

Do you understand that before CUDA and OpenCL existed, GPGPU was executed by shaders available through DirectX or OpenGL?

Re-executing OpenGL on top of OpenCL or CUDA will result in unnecessary complexity. On a system that supports OpenCL or CUDA, OpenGL and DirectX drivers will share a lot of code with the OpenCL and / or CUDA driver, as they access the same hardware.

Update

On a modern GPU, the entire pipeline runs on HW. To do this, you need the whole GPU. What is done on the processor is accounting and data management. Bookkeeping is a complete transformation matrix (ie, defining transformation matrices and assigning them to the corresponding GPU registers), loading geometry data (transfer geometry and image data into the GPU’s memory), compiling shaders, and finally pulling out the trigger, " send commands to the GPU that force it to execute a prepared program for drawing nice things.Then the GPU itself extracts the geometry and image data from memory, processes them in according to the shaders and parameters in the registers (= uniform).

0

datenwolf Oct 30 '11 at 14:02

source share

Anteru · Accepted Answer · 2011-10-30T14:50:29+0000

Modern GPUs still contain a lot of hardware with fixed functionality that are hidden from Computing APIS. These include: mixing steps, rasterization of triangles and many bursts on the chip. Shaders, of course, display CUDA / OpenCL well - after all, shaders and computational languages use the same part of the GPU - the main shader kernels. Think of these devices as many very wide SIMD processors (for example, the GTX 580 has 16 cores with a 32-bit SIMD module.)

You get access to texture blocks through shaders, so there is no need to implement this in "compute". If you did, your performance would most likely suck, since you won’t get access to texture caches that are optimized for spatial placement.

You should not underestimate the amount of work needed for rasterization. This is a serious problem, and if you throw the entire GPU on it, you will get about 25% of the performance of the raster equipment (see High-performance rasterization of software on GPUs .) This includes mixing costs, which are also performed by usually fixed functions.

Tesselation also has a fixed function part that is hard to emulate efficiently, as it boosts input to 1: 4096, and you certainly don't want to reserve so much memory.

Then you get a lot of performance penalties because you do not have access to framebuffer compression, because there is “dedicated” equipment for this that is “hidden” from you when you are in calculation-only mode. Finally, since you do not have queues on the chip, it will be difficult to achieve the same efficiency as the "graphics pipeline" (for example, it can easily output output from vertex shaders depending on the load on the shader, you can't switch shaders that flexibly.)

What part of the modern graphics pipeline uses dedicated hardware?

Update

More articles: