Compute Shader Shared Memory Contains Artifacts

I am trying to write a general implementation of Gaussian blur using a common computational shader.

It mainly works, however it contains artifacts that change every frame, even when the scene is static. I spent the last few hours trying to debug this. I got to the point that the limits were not exceeded, unrolling all the loops, replacing the uniforms with permanent ones, but the artifacts persist.

I tested the source code with artifacts on three different machines / GPUs (2 nvidia, 1 intel), and they all give the same results. Simulating a deployed / persistent version of code execution with workgroups running back and forth using simple C ++ code does not cause these errors.

enter image description here

[96] [96] [16] [48], .

, , , , , - . .

16x48, 3072 , 10% .

1616 , 3

HSV, vals 0-1 0-360 (--), .

#version 430
//Execute in 16x16 sized thread blocks
layout(local_size_x=16,local_size_y=16) in;
uniform layout (r32f) restrict writeonly image2D _imageOut;
shared float hoz[16][48];
void main () 
{
    //Init shared memory with a big out of bounds value we can identify
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = 20000.0f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+16] = 20000.0f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+32] = 20000.0f;
    //Sync shared memory
    memoryBarrierShared();
    //Write the values we want to actually read back
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = 0.5f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+16] = 0.5f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+32] = 0.5f;
    //Sync shared memory
    memoryBarrierShared();
    //i=0,8,16 work
    //i=1-7,9-5,17 don't work (haven't bothered testing further
    const int i = 17;
    imageStore(_imageOut, ivec2(gl_GlobalInvocationID.xy), vec4(hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+i]));
    //Sync shared memory (can't hurt)
    memoryBarrierShared();
}

8x8 .

glDispatchCompute(9, 9, 0); glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

, , 14 enter image description here

glDispatchCompute(512/16, 512/16, 0);//Full image is 512x512 glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

, , 60FPS (vsync) /.

enter image description here

+4
1
memoryBarrierShared();

, . , , .

barrier. memoryBarrierShared.

+5

Source: https://habr.com/ru/post/1663900/


All Articles