I am trying to write a general implementation of Gaussian blur using a common computational shader.
It mainly works, however it contains artifacts that change every frame, even when the scene is static. I spent the last few hours trying to debug this. I got to the point that the limits were not exceeded, unrolling all the loops, replacing the uniforms with permanent ones, but the artifacts persist.
I tested the source code with artifacts on three different machines / GPUs (2 nvidia, 1 intel), and they all give the same results. Simulating a deployed / persistent version of code execution with workgroups running back and forth using simple C ++ code does not cause these errors.

[96] [96] [16] [48], .
, , , , , - . .
16x48, 3072 , 10% .
1616 , 3
HSV, vals 0-1 0-360 (--), .
#version 430
layout(local_size_x=16,local_size_y=16) in;
uniform layout (r32f) restrict writeonly image2D _imageOut;
shared float hoz[16][48];
void main ()
{
hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = 20000.0f;
hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+16] = 20000.0f;
hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+32] = 20000.0f;
memoryBarrierShared();
hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = 0.5f;
hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+16] = 0.5f;
hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+32] = 0.5f;
memoryBarrierShared();
const int i = 17;
imageStore(_imageOut, ivec2(gl_GlobalInvocationID.xy), vec4(hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+i]));
memoryBarrierShared();
}
8x8 .
glDispatchCompute(9, 9, 0);
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
, , 14

glDispatchCompute(512/16, 512/16, 0);//Full image is 512x512
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
, , 60FPS (vsync) /.
