Is there a (semi) reliable way to distinguish "From Resources" caused by DTR and "From Resources" caused by other problems?
one)
If you can access
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrDelay ValueType : REG_DWORD ValueData : Number of seconds to delay. 2 seconds is the default value.
from wmi to multiply it by
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrLimitCount ValueType : REG_DWORD ValueData : Number of TDRs before crashing. The default value is 5.
again with wmi. You get 10 seconds when you multiply them. And you should get
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrLimitTime ValueType : REG_DWORD ValueData : Number of seconds before crashing. 60 seconds is the default value.
which should read 60 seconds from WMI.
For this example, the computer takes 5 x 2 seconds + 1 additional delay of up to 60 seconds to break the limit. You can then check the application if the last stopwatch counter has exceeded these limits. If so, maybe it's TDR. In addition, there is an upper limit to the thread-exit from the driver,
KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers KeyValue : TdrDdiDelay ValueType : REG_DWORD ValueData : Number of seconds to leave the driver. 5 seconds is the default value.
which is 5 seconds by default. Access to an invalid memory segment should expire faster. Perhaps you can increase these TDR time limits from WMI to a few minutes so that it can let the program compute without glitch due to hunger. But modifying the registry can be dangerous, for example, you set a TDR time limit of 1 second or some fragment of it, then windows can never load without constant TDR crashes, so just reading these variables should be safer.
2)
You divide the overall work into much smaller parts. If the data is not separable, copy it once, and then run the long-runnning kernel as kernels with a very short rank n times with some expectation between any two.
Then you must be sure that the TDR is eliminated. If this version works, but the long-term kernel does not work, this is a TDR error. If it is the other way around, it is a memory failure. Looks like that:
short running x 1024 times long running long running <---- fail? TDR! because memory would crash short ver. too! long running
One more attempt:
short running x 1024 times <---- fail? memory! because only 1ms per kernel long running long running long running
Alternatively, can I at least reliably (in Java / through the OpenCL API) determine that the GPU used for the calculation also triggers the display?
one)
Use the interaction properties of both devices:
this gives a list of compatible devices. You must get your id to exclude it if you do not want to use it.
2)
Ask another thread to run some opengl or directx static object drawing code so that one of the busy gpus is busy. Then test all gpus at the same time using a different thread for some simple opencl kernel codes. Test:
- opengl starts to draw something with a high triangle value @ 60 fps.
- run devices to calculate opencl, get the average number of cores per second
- device 1: 30 keps
- device 2: 40 cap
- after a while, stop opengl and close its windows (if not already)
- device 1: 75 keps -----> maximum percentage increase! โ display !!!
- device 2: 41 keps ----> not such a high magnification, but it can
you should not copy any data between devices doing this, therefore CPU / RAM will not be a bottleneck.
3)
If the data is separable, then you can use the separation and rest algorithm to give any gpu its work only when it is available, and allow some of the flexibility to be displayed (since it is a performance-oriented solution and may be similar for the short version, but planning is done on multiple gpus)
4)
I did not check because I sold my second gpu, but you should try
CL_DEVICE_TYPE_DEFAULT
on your multi-gpu system to check if it will show gpu or not. Turn off the PC, connect the monitor cable to another board, try again. Turn off, change cards, try again. Turn off, remove one of the cards, so that only 1 gpu and 1 processor remain, try again. If all this gives only the gpu mapping, then it should by default indicate the gpu mapping.