The texture of the GPU is not coherent. This means that writing global memory to a specific location in the global memory underlying the texture may or may not be reflected the next time the texture accesses the same location. Thus, in such a scenario, there is a danger of reading and writing.
If, however, the code writes global memory to a specific place in the global memory that underlies the texture, and this location is never subsequently read due to the texture during kernel life, there is no read-write -write danger, and the code will behave like this as expected: updated data in global memory can be accessed by the subsequent kernel in any way desired, including access to the texture, since the texture cache is cleared when the kernel starts.
I personally used this approach to speed up on-site operations with small steps, as the texture read path provided a higher load. An example is operation BLAS-1 [D | S | Z | C] SCAL in CUBLAS, which scales each element of the array with a scalar.
source share