I want to write a program for the GPU (preferably OpenCL), and most of the calculation consists of counting the number 1 in a bitmap (packed as long or int).
So, on modern processors, I would just use my own __popcnt instruction. I read in several places on the Internet that modern graphics processors, this instruction is also present in hardware, which will be a huge acceleration for me. (at least for 32-bit, not sure about 64)
However, I have not found anywhere how to get this instruction. So:
1) how do I know which GPUs have this instruction? (I still need to buy my GPU, so it will be a modern high-end ... maybe the Radeon HD7000 or nVidia Kepler series)
2) how to call this instruction from OpenCL (or a similar GPU language)?
source share