GAE: What is the difference between <min-pending-latency> and <max-pending-latency>?
As far as I can read the docs, both settings do the same thing: start a new instance when the request has been in the waiting queue longer than this parameter says.
<max-pending-latency>
maximum time that App Engine must allow a request to wait in a pending queue before starting a new instance to process it. Default: 30 ms.
- A low maximum means that App Engine will start new instances earlier for pending requests, increasing performance but increasing operating costs.
- A high maximum means that users can wait longer while their requests are serviced, if there are pending requests and there are no downtime instances to service them, but your application will cost less.
<min-pending-latency>
The minimum amount of time that App Engine must allow a request to wait in a pending queue before starting a new instance to process it.
- A low minimum means that requests should spend less time in the waiting queue when all existing instances are active. This improves performance, but increases the cost of launching your application.
- A high minimum means that requests will remain longer if all existing instances are active. This reduces operating costs, but increases the time that users must wait while their requests are served.
Source: https://cloud.google.com/appengine/docs/java/config/appref
What is the difference between min and max then?
The information that you might not understand to understand these parameters is that App Engine can instantiate at any time between the minimum delay and the max-waiting delay.
This means that the instance will never be created to serve the pending request until the minimum delay and will always be created after the maximum latency has been reached.
I believe the best way to understand is to look at the timeline of events when the request enters the pending queue:
- The request arrives at the application, but there is no instance to service it, so it is placed in the queue of pending requests.
- Before reaching the minimum delay : App Engine tries to find an available instance to serve the request, rather than creating a new instance.
- The minimum latency has been reached, and until the maximum expected latency is reached : App Engine is trying to find an available instance to query the request, but may create a new instance.
- The maximum expected timeout has been reached : App Engine stops searching for an available instance to serve the request and creates a new instance.