Cuda core for adding (a, b, c) using texture objects for a & b - works correctly for the operation "increment operation" add (a, b, a)?

Question

Cuda core for adding (a, b, c) using texture objects for a & b - works correctly for the operation "increment operation" add (a, b, a)?

I want to implement the cuda function 'add (a, b, c)' to add (component) two single-channel floating point images 'a' and 'b' together and save the result in a floating -exact image 'c'. So c = a + b. This function will be implemented by linking the texture objects "aTex" and "bTex" with the linear images "a" and "b", and then accessing the image "a" and "b" inside the kernel only through the texture of the objects "aTex" and "bTex " The amount is stored in 'c' via a simple write to global memory. What happens now if I call a function to increment 'a' by 'b' - so I call 'add (a, b, a)'? Since now the image "a" is used in the kernel in two places - from "a" I read the value through the texture object "aTex", and also save the values in "a" through writing to the global memory. Is it possible that this use of the add function produces incorrect results?

0

cuda textures

user2454869 Oct 17 '14 at 9:33

source share

1 answer

njuffa · Accepted Answer · 2014-10-17T20:16:16+0000

The texture of the GPU is not coherent. This means that writing global memory to a specific location in the global memory underlying the texture may or may not be reflected the next time the texture accesses the same location. Thus, in such a scenario, there is a danger of reading and writing.

If, however, the code writes global memory to a specific place in the global memory that underlies the texture, and this location is never subsequently read due to the texture during kernel life, there is no read-write -write danger, and the code will behave like this as expected: updated data in global memory can be accessed by the subsequent kernel in any way desired, including access to the texture, since the texture cache is cleared when the kernel starts.

I personally used this approach to speed up on-site operations with small steps, as the texture read path provided a higher load. An example is operation BLAS-1 [D | S | Z | C] SCAL in CUBLAS, which scales each element of the array with a scalar.

Cuda core for adding (a, b, c) using texture objects for a & b - works correctly for the operation "increment operation" add (a, b, a)?

More articles: