From the Nsight Guide
It says
Use a separate instance of Visual Studio to debug the host part of the target application. If you want to debug the host part of your CUDA application while the CUDA debugger is attached, you need another instance of Visual Studio. Attaching the same instance of Visual Studio to debug both the host part and the device part of the target application will result in conflicting debuggers. The result is that the target application and CUDA debugger are blocked by the operations of its own debugger.
So, to debug the CUDA application, follow these steps:
- Open Visual Studio (Instance VS 1) and set a breakpoint in the CUDA core and click on "Start CUDA Debugging". This will start the application instance and stop where u sets the breakpoint.
- Open another instance of Visual Studio (VS # 2 instance) (for some reason you need to run it as admin) and join the process that you started in step 1.
- In the VS # 2 instance, find the file you want to debug your processor and set a breakpoint.
- In VS instance # 1, continue execution (should drop out of the current CUDA kernel). At this point, the CPU breakpoint set in the instance of VS # 2 should be removed.
Extra credit: Debugging your CUDA application remotely. Steps:
- On the target machine, run msvsmon.exe (remote debugger) as an administrator. This can be found in C: \ Program Files (x86) \ Microsoft Visual Studio 11.0 \ Common7 \ IDE \ Remote Debugger \ x64.
- On the host machine (the one from which you want to control debugging), open VS and make sure that the Nsight debugger is listed on the remote computer (Nsight User Properties> Launch> Connection Name). NOTE. The Nvidia Nsight Monitor must be running on the target computer for this to work.
- Follow steps 1 and 2 from the previous section (not remote debugging). In step 2, you will need to specify the remote computer to debug the processor (Debug> Attach to Process> Qualifier must be set to the name of the target computer or IP address).
- In addition, non-remote debugging steps 3 and 4 are applied.
NOTE It seems that remote debugging on a remote VS processor is not as efficient as local debugging of the central processor. For example, when you switch to processor variables, the values ββare not displayed as you would expect with local debugging.
STILL UNANSWERED . Can I start debugging the host code before pushing the CUDA breakpoint? This seems like a big limitation that you can only debug host code after your first CUDA kernel. What if you want to debug host code before the first CUDA core?
source share