Modern GPUs still contain a lot of hardware with fixed functionality that are hidden from Computing APIS. These include: mixing steps, rasterization of triangles and many bursts on the chip. Shaders, of course, display CUDA / OpenCL well - after all, shaders and computational languages use the same part of the GPU - the main shader kernels. Think of these devices as many very wide SIMD processors (for example, the GTX 580 has 16 cores with a 32-bit SIMD module.)
You get access to texture blocks through shaders, so there is no need to implement this in "compute". If you did, your performance would most likely suck, since you won’t get access to texture caches that are optimized for spatial placement.
You should not underestimate the amount of work needed for rasterization. This is a serious problem, and if you throw the entire GPU on it, you will get about 25% of the performance of the raster equipment (see High-performance rasterization of software on GPUs .) This includes mixing costs, which are also performed by usually fixed functions.
Tesselation also has a fixed function part that is hard to emulate efficiently, as it boosts input to 1: 4096, and you certainly don't want to reserve so much memory.
Then you get a lot of performance penalties because you do not have access to framebuffer compression, because there is “dedicated” equipment for this that is “hidden” from you when you are in calculation-only mode. Finally, since you do not have queues on the chip, it will be difficult to achieve the same efficiency as the "graphics pipeline" (for example, it can easily output output from vertex shaders depending on the load on the shader, you can't switch shaders that flexibly.)
source share