I am wondering if anyone can suggest a better approach to calculate the average / standard deviation of a large number of relatively small but different sized arrays in CUDA?
The example of parallel reduction in the SDK works on one very large array, and it seems that the size is convenient to multiply by the number of threads per block, but my case is different from the other:
Basically, I have a large number of objects, each of which contains two components: upperand lower, and each of these components has a coordinate xand y. i.e.
upper.x, lower.x, upper.y, lower.y
Each of these arrays has a length of approximately 800, but it varies between objects (not inside the object), for example.
Object1.lower.x = 1.1, 2.2, 3.3
Object1.lower.y = 4.4, 5.5, 6.6
Object1.upper.x = 7.7, 8.8, 9.9
Object1.upper.y = 1.1, 2.2, 3.3
Object2.lower.x = 1.0, 2.0, 3.0, 4.0, 5.0
Object2.lower.y = 6.0, 7.0, 8.0, 9.0, 10.0
Object2.upper.x = 11.0, 12.0, 13.0, 14.0, 15.0
Object2.upper.y = 16.0, 17.0, 18.0, 19.0, 20.0
, , C - : , . , , , , .
GPU ?. , , , , , , .