Efficient TensorFlow Distributed Memory Allocation for Recursive Concatenation

DenseNets tend to take up a lot of memory in TensorFlow, because each concat operation is stored in a separate distribution. A recent article, Effective Implementation of DenseNets , demonstrates that memory usage can be significantly reduced by allocating allocations. This image from the paper + pytorch implementation illustrates the approach to shared memory:

densenet shared memory

How can this be implemented with TensorFlow? If this cannot be done with python, how can it be correctly implemented in Op with CPU and GPU support?

Effective DenseNet Pytorch Implementation
Keras DenseNet Implementation with "naive" selections, works with the TensorFlow backend.

I created a TensorFlow Feature Request for the required distribution functions .

+5

c ++ python memory-management tensorflow tensorflow-gpu

Andrew hundt Sep 08 '17 at 21:50

No one has answered this question yet.

See related questions:

eleven

How to properly manage memory and batch size with TensorFlow

6

Tensor Streams Store Learning Data in GPU Memory

5

DMA between CPU and GPU in TensorFlow

4

Tensorflow processor memory allocation

2

Force Keras with Tensorflow GPU Support

1

TensorFlow and CUDA Code Exchange

1

Keras with Tensoflow Support Allocates GPU Memory but Does Not Use GPU

1

C ++ / CUDA memory allocation in TensorFlow Op

0

why does tenorflow allocate more memory than requested in gpu? Is there any function to determine the amount of allocated memory?

0

How to fully use the GPU to reduce the learning time of the Keras CNN model

Source: https://habr.com/ru/post/1271651/

All Articles