CudaDeviceReset for multiple gpu

I am currently working on a gpu server that has 4 Tesla T10 gpu. Although I continue to test kernels and often have to kill processes with ctrl-C, I have added a few lines to the end of the simple device request code. The code is below:

#include <stdio.h> // Print device properties void printDevProp(cudaDeviceProp devProp) { printf("Major revision number: %d\n", devProp.major); printf("Minor revision number: %d\n", devProp.minor); printf("Name: %s\n", devProp.name); printf("Total global memory: %u\n", devProp.totalGlobalMem); printf("Total shared memory per block: %u\n", devProp.sharedMemPerBlock); printf("Total registers per block: %d\n", devProp.regsPerBlock); printf("Warp size: %d\n", devProp.warpSize); printf("Maximum memory pitch: %u\n", devProp.memPitch); printf("Maximum threads per block: %d\n", devProp.maxThreadsPerBlock); for (int i = 0; i < 3; ++i) printf("Maximum dimension %d of block: %d\n", i, devProp.maxThreadsDim[i]); for (int i = 0; i < 3; ++i) printf("Maximum dimension %d of grid: %d\n", i, devProp.maxGridSize[i]); printf("Clock rate: %d\n", devProp.clockRate); printf("Total constant memory: %u\n", devProp.totalConstMem); printf("Texture alignment: %u\n", devProp.textureAlignment); printf("Concurrent copy and execution: %s\n", (devProp.deviceOverlap ? "Yes" : "No")); printf("Number of multiprocessors: %d\n", devProp.multiProcessorCount); printf("Kernel execution timeout: %s\n", (devProp.kernelExecTimeoutEnabled ? "Yes" : "No")); return; } int main() { // Number of CUDA devices int devCount; cudaGetDeviceCount(&devCount); printf("CUDA Device Query...\n"); printf("There are %d CUDA devices.\n", devCount); // Iterate through devices for (int i = 0; i < devCount; ++i) { // Get device properties printf("\nCUDA Device #%d\n", i); cudaDeviceProp devProp; cudaGetDeviceProperties(&devProp, i); printDevProp(devProp); } printf("\nPress any key to exit..."); char c; scanf("%c", &c); **for (int i = 0; i < devCount; i++) { cudaSetDevice(i); cudaDeviceReset(); }** return 0; } 

My request is related to a for loop immediately before the end of main (), in which I install each device one at a time, and then use the cudaResetDevice command. I have a strange feeling that this code, although it does not produce any errors, but I can not reset all devices. Instead, the program only resets the default device ie device 0 each time. Can someone tell me what to do with reset of each of the 4 devices.

thanks

+6
source share
3 answers

It may be too late, but if you write a signal handler function, you can safely get rid of memory leaks and reset:

 // State variables for extern int no_sigint; int no_sigint = 1; extern int interrupts; int interrupts = 0; /* Catches signal interrupts from Ctrl+c. If 1 signal is detected the simulation finishes the current frame and exits in a clean state. If Ctrl+c is pressed again it terminates the application without completing writes to files or calculations but deallocates all memory anyway. */ void sigint_handler (int sig) { if (sig == SIGINT) { interrupts += 1; std::cout << std::endl << "Aborting loop.. finishing frame." << std::endl; no_sigint = 0; if (interrupts >= 2) { std::cerr << std::endl << "Multiple Interrupts issued: " << "Clearing memory and Forcing immediate shutdown!" << std::endl; // write a function to free dynamycally allocated memory free_mem (); int devCount; cudaGetDeviceCount (&devCount); for (int i = 0; i < devCount; ++i) { cudaSetDevice (i); cudaDeviceReset (); } exit (9); } } } 

....

 int main(){ ..... for (int simulation_step=1 ; simulation_step < SIM_STEPS && no_sigint; ++simulation_step) { .... simulation code } free_mem(); ... cuda device resets return 0; } 

If you use this code (you can even include the first fragment in the external header, it works. You can have 2 levels of control ctrl + c: the first press stops your simulation and exits normally, but the application finishes rendering a step that is great for grace and has the correct results, if you press ctrl + c again, it closes the application, freeing up all the memory.

+3
source

It looks like you can add a function to your GPU programs to catch the ctrl + c (SIGINT) signal and call the cudaDeviceReset () function for each device that the program used.

An example code for calling a function when SIGINT is detected can be found here:

fooobar.com/questions/895565 / ...

It seems good practice to include such code for every GPU program you write, and I will do the same :-)

I don’t have time to write a full detailed answer, so read another answer and comments.

+5
source

cudaDeviceReset intended to destroy the resources associated with this GPU context in the process in which it is running. One CUDA process cannot reset or otherwise affect the context of another process. Therefore, when your request for a changed device calls cudaDeviceReset , it allocates only resource resources, and not those used by any other process.

+1
source

Source: https://habr.com/ru/post/895562/


All Articles