Typically, "effective" is likely to measure the number of memory and GPU cycles (average, min, max) of your program. Then the measure of efficiency will be avg (mem) / shared memory over a period of time, etc. With AVG (GPU) / Max GPU loops.
Then I compared these metrics with metrics from some GPU test suites (which can be considered quite effective when using most of the GPU). Or you can measure some random GPU programs of your choice. That would be the way I would do it, but I never thought to try such luck!
As for bottlenecks and "optimal" performance. These are probably NP-complete problems that no one can help you with. Exit the old profiler and debuggers and start your journey through your code.
source share