With CUDA 4.0 or later, cudaSetDevice(deviceId)
, followed by your draft code, should work.
Just keep in mind that you will need to create and work with separate vectors on each device (if you do not have devices that support access to peer memory and PCI-Express bandwidth, enough for your task).
source share