I have one GPU at my disposal for deployment, but I need to deploy several models. I do not want to allocate full GPU memory for the first deployed model, because then I cannot deploy my subsequent models. During training, this can be controlled using the gpu_memory_fraction parameter. I use the following command to deploy my model -
tensorflow_model_server --port=9000 --model_name=<name of model> --model_base_path=<path where exported models are stored &> <log file path>
Is there a flag that I can set to control the distribution of gpu memory?
thanks
source share