I have an MTLTexture containing 16 bit unsigned integers ( MTLPixelFormatR16Uint ). Values range from approximately 7000 to 20,000, with 0 being used as the "nodata" value, so it is skipped in the code below. I would like to find the minimum and maximum values so that I can scale these values between 0-255. Ultimately, I will look for the basics of the minimum and maximum values in the data histogram (it has some outliers), but for now, I'm stuck in a simple min / max extraction.
I can read data from the GPU to the CPU and pull the min / max values, but I prefer to perform this task on the GPU.
First try
A command encoder is sent with 16x16 streams for each stream group, the number of stream groups is based on the size of the texture (for example, width = textureWidth / 16, height = textureHeight / 16).
typedef struct { atomic_uint min; atomic_uint max; } BandMinMax; kernel void minMax(texture2d<ushort, access::read> band1 [[texture(0)]], device BandMinMax &out [[buffer(0)]], uint2 gid [[thread_position_in_grid]]) { ushort value = band1.read(gid).r; if (value != 0) { uint currentMin = atomic_load_explicit(&out.min, memory_order_relaxed); uint currentMax = atomic_load_explicit(&out.max, memory_order_relaxed); if (value > currentMax) { atomic_store_explicit(&out.max, value, memory_order_relaxed); } if (value < currentMin) { atomic_store_explicit(&out.min, value, memory_order_relaxed); } } }
From this I get the minimum and maximum value, but for the same dataset, min and max often return different values. Exactly enough, this is min and max from one thread when multiple threads are running.
Second attempt
Based on the previous attempt, this time I save the individual min / max values from each stream, all 256 (16x16).
kernel void minMax(texture2d<ushort, access::read> band1 [[texture(0)]], device BandMinMax *out [[buffer(0)]], uint2 gid [[thread_position_in_grid]], uint tid [[ thread_index_in_threadgroup ]]) { ushort value = band1.read(gid).r; if (value != 0) { uint currentMin = atomic_load_explicit(&out[tid].min, memory_order_relaxed); uint currentMax = atomic_load_explicit(&out[tid].max, memory_order_relaxed); if (value > currentMax) { atomic_store_explicit(&out[tid].max, value, memory_order_relaxed); } if (value < currentMin) { atomic_store_explicit(&out[tid].min, value, memory_order_relaxed); } } }
Returns an array containing 256 sets of min / max values. Of these, I think I could find the smallest of the minimum values, but this seems like a bad approach. Thank the pointer in the right direction, thanks!