OpenCL for GPU vs FPGA

Question

OpenCL for GPU vs FPGA

I recently read about OpenCL / CUDA for FPGAs and GPUs. As I understand it, FPGA wins in terms of power. The explanation for this, I found in some article:

Reconfigurable devices can have much lower power consumption from the peak value, since only the configured parts of the chip are active.

Based on the foregoing, I have a question - does this mean that if any CU [Compute Unit] does not perform any work item, it still consumes energy? (and if so - why does it consume energy?)

+4

opencl cuda fpga

Yakov Sep 08 '12 at 16:45

source share

2 answers

As always, it depends on the workload. For workloads that are well supported by GPU hardware (e.g., floating point, texture filtering), I doubt FPGA can compete. Of course, I heard about image processing workloads where FPGAs are competitive or better. This makes sense because GPUs are not optimized for working with small integers. (For this reason, GPUs are often uncompetitive with processors with SSE2-optimized image processing code.)

In terms of energy consumption, for GPUs, suitable workloads usually keep all executable devices busy, so that's not all or nothing.

+3

Archaeasoftware Sep 08 '12 at 20:27

source share

Paul s · Accepted Answer · 2012-09-10T14:29:36+0000

Yes, the idle circuit still consumes energy. He does not consume as much, but he still consumes some. The reason for this is how transistors work, and how CMOS logic consumes energy.

Classically, CMOS logic (type on all modern chips) consumes energy only when switching state. This is achieved by very low power compared to the technologies that came before it, which consumed energy all the time. However, every time a beat front occurs, some logical changes change, even if there is no work. The higher the clock frequency, the more energy is used. GPUs typically have high clock speeds, so they can do a lot of work; FPGAs have low clock speeds. This is the first effect, but it can be mitigated if you do not synchronize circuits that do not work (called “clock strobe”)

As the size of the transistors became smaller and smaller, the amount of energy used in the switch was reduced, but other effects (known as leakage) became more significant. Now we are at the point where the leakage power is very significant, and it is multiplied by the number of gates that you have in the design. Complex structures have high leakage power; Simple designs have low leakage power (in very simple conditions). This is the second effect.

Therefore, for a simple task, it might be more energy efficient to have a small, specialized, low-speed FPGA rather than a large integrated, but high-speed / general processor / GPU.

OpenCL for GPU vs FPGA

More articles: