Having determined how to deal with errors:
static void HandleError( cudaError_t err,
const char *file,
int line ) {
if (err != cudaSuccess) {
printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
file, line );
exit( EXIT_FAILURE );
}
}
#define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
As a rule, to save our results in a d_results array, of type double, of size N, which can be immediately allocated in the GPU memory, we can transfer data from the device to the host like this:
double *d_results;
HANDLE_ERROR(cudaMalloc(&d_results,N*sizeof(double)));
.....
vector<double> results(N);
cudaMemcpy(results.data(),d_results,N*sizeof(double),cudaMemcpyDeviceToHost);
If the second line fails because there is not enough memory to store all the results at once. How do I manage to perform the calculations and correctly transfer the results to the host? is required to perform batch calculations? I prefer to avoid manual dosing. What is the standard approach for managing this situation in CUDA?
source
share