DMA between CPU and GPU in TensorFlow

I run TensorFlow on NVidia Jetson TX1 and run out of memory when I train a large network like GoogleNet.

The processor and GPUs in TX1 do not have separate memory, and they have one memory. However, it seems that TensorFlow is trying to allocate a separate memory space and copy from the processor side to the GPU side. Thus, it requests 2x memory than it really needs.

In my opinion, this situation can be solved using DMA access between the CPU and GPU. As far as I know, TensorFlow uses DMA between GPUs (not sure which one is being processed. TensorFlow? Or the GPU driver?). Can I use DMA between CPU and GPU also in TensorFlow? or any other suggestions?

EDIT: I just discovered that there is a Zero Copy feature in CUDA that I definitely wanted. However, is it possible to use this feature in TensorFlow?

+5
source share

Source: https://habr.com/ru/post/1264844/


All Articles