Is there really a timeout for cores on nvidia gpus?

looking for answers to the question why my kernels produce strange error messages or just the results "0". I found this answer on SO that mentions that there is a 5s timeout for a kernel running on nvidia gpus? I searched for search queries, but I could not find supporting sources or additional information.

What do you know about this?

Could a time limit cause strange behavior for kernels with long runtimes?

Thanks!

+4
source share
2 answers

Further search on Google brought this up in CUDA_Toolkit_Release_Notes_Linux.txt (Known Issus):

# The launch of individual GPU programs is limited by the run time in less than 5 seconds on a GPU with a display connected. Exceeding this limit usually results in the error message being triggered through the CUDA driver or CUDA runtime. GPUs without a connected display are not limited to 5 seconds. For this reason, it is recommended that CUDA run on a GPU that is NOT attached to the display and that does not have a Windows desktop. In this case, the system must contain at least one NVIDIA graphics adapter, which serves as the primary graphics adapter.

[update] It appears that the official name for this feature is “watchdog timer”.

+6
source

If you are in Windows Vista or later, the WDDM driver stack will automatically reset after about two seconds if you do not configure TDR timeouts . (Windows cannot tell the difference between a GPU running with a large core and a GPU that is locked). Tesla TCC cards do not fall under the usual restrictions on display adapters and therefore can run longer kernels.

+1
source

Source: https://habr.com/ru/post/1341300/


All Articles