Short answer
This is a problem that does not disappear. This is a processor bandwidth issue for buffering the GPU. You have increased the amount of data that should be sent over the bus by a large factor.
Possible workaround
The essence of what you are trying to do is to include the previous frames in your model. If this is what you want to accomplish, there is another way to do it.
If the batch was not a random choice of stacked images, if instead the batch was in the usual way, but all were consistent in time.
In the second case, you send images with only three channels, but the images will not fail.
, -.
-, , , .
-, [batch, height, weight, channel] GPU ,
[ batch[1:], height, width, channel] - [ batch[:-1], height, width, channel]
and assign it to diffTensor
origTensor [ batch[5:-0], height, width, channel]
diffTensor [ batch[5:-0], height, width, channel]
diffTensor [ batch[4:-1], height, width, channel]
diffTensor [ batch[3:-2], height, width, channel]
diffTensor [ batch[2:-3], height, width, channel]
diffTensor [ batch[1:-4], height, width, channel]
diffTensor [ batch[0:-5], height, width, channel]
5 " "
? , 100 GPU, 95 + diff , 100 , 95 + diff , 5 500 . x5