I recently implemented a VGG-16 network using both Tensorflow and PyTorch, the dataset is CIFAR-10. Each image is 32 * 32 RGB.
I use 64-bit size at the beginning, while I found PyTorch using much less GPU memory than tensorflow. Then I did some experiments and got the figure, which is posted below.

After some research, I knew the tensor flow using the BFC algorithm to manage memory. Therefore, he can explain why the memory of the tensor flow decreases or increases by 2048, 1024, ... MB, and sometimes the memory usage does not increase when the lot size is larger.
But I'm still confused why the memory usage is lower if the lot size is 512, and the lot size is 384, 448, etc., which is smaller. Same as batch size from 1024 to 1408, and batch size from 2048 to 2688.
Here is my source code:
PyTorch: https://github.com/liupeng3425/tesorflow-vgg/blob/master/vgg-16-pytorch.py
Tensorflow: https://github.com/liupeng3425/tesorflow-vgg/blob/master/vgg-16.py
edit: I have two Titan XP on my computer, OS: Linux Mint 18.2 64-bit.
I determine the GPU memory usage with the nvidia-smi .
My code runs on GPU1, which is defined in my code:
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "1"
And I'm sure that only one application uses GPU1.
GPU memory usage can be defined in the application list below. For example, as shown in the figure below, the process name is /usr/bin/python3 , and its use in the GPU is 1563 MiB.
