Although some answers have already been given here, and this is an old thread, I just thought I would add this for posterity and what not:
The main reason that the CPU and GPU are very different in performance for certain problems is the design decisions regarding the allocation of chip resources. The processor allocates most of its chip space for large caches, instruction decoders, peripherals and system control, etc. Their cores are much more complex and work with much higher clock frequencies (which leads to an increase in heat to the core, which should be dissipated). In contrast, GPUs devote their chip space to packing as many floating-point ALUs on the chip as possible, as they can go. The initial goal of the GPU was to multiply matrices as quickly as possible (because this is the main type of computation related to graphic rendering.) Since matrix multiplication is an awkward parallel task (for example, each output value is calculated completely independently of any other output value), and the code path for each of these calculations is identical, the space of the microcircuit can be saved if several ALUs follow the instructions decoded by one decoder of commands, since they all execute the same nor the same operations at the same time. On the contrary, each of the CPU cores must have its own separate instruction decoder, since the cores do not correspond to identical codes, which makes each of the CPU cores much larger on the matrix than the GPU cores. Since the primary calculations performed by matrix multiplication are floating point multiplication and floating point additions, the GPUs are implemented in such a way that each of them is single-ended operations and actually even contains a smooth multiplication and addition command that multiplies two numbers and adds the result to third number in one cycle. This is much faster than a regular processor, where floating point multiplication is often multi-cylinder. Again, the trade-off here is that the chip space is dedicated to floating-point math equipment, while other instructions (such as the control flow) are often much slower on the core than on the processor, and sometimes they just don't exist on the GPU at all.
In addition, since GPU cores operate at much lower clock speeds than regular CPU cores and do not contain as much complex circuitry, they do not produce so much heat per core (or use so much energy per core). This allows, moreover, they must be packed into the same space without overheating the chip, and also allow GPUs with 1000 cores to have the same power and cooling requirements for a processor with 4 or 8 cores.
source share