Yes, the GPU device must be explicitly set or used by default (usually device 0)
Keep in mind that as soon as the runtime starts using one device, all functions called in the same thread will be attached to this device.
Something that is useful to me when starting a stream,
cudaThreadExit();
cudaSetDevice(deviceId);
cudaMalloc
cudaMemcpy
etc..
, .