Efficient TensorFlow Distributed Memory Allocation for Recursive Concatenation

DenseNets tend to take up a lot of memory in TensorFlow, because each concat operation is stored in a separate distribution. A recent article, Effective Implementation of DenseNets , demonstrates that memory usage can be significantly reduced by allocating allocations. This image from the paper + pytorch implementation illustrates the approach to shared memory:

densenet shared memory

How can this be implemented with TensorFlow? If this cannot be done with python, how can it be correctly implemented in Op with CPU and GPU support?

I created a TensorFlow Feature Request for the required distribution functions .

+5
source share

Source: https://habr.com/ru/post/1271651/


All Articles