DenseNets tend to take up a lot of memory in TensorFlow, because each concat operation is stored in a separate distribution. A recent article, Effective Implementation of DenseNets , demonstrates that memory usage can be significantly reduced by allocating allocations. This image from the paper + pytorch implementation illustrates the approach to shared memory:

How can this be implemented with TensorFlow? If this cannot be done with python, how can it be correctly implemented in Op with CPU and GPU support?
I created a TensorFlow Feature Request for the required distribution functions .
source share