I noticed periodic but successive bursts of delay from my application running on the application engine. At first, I thought the network might be slow, but application statistics confirmed that it wasnβt.
I was able to reproduce the bursts of delay using the old and new versions of the SDK, currently I am using the following:
- Application SDK: 1.9.42
- Google Endpoints: 1.9.42
- Objectify: 5.1.13
- Appstats (for debugging network latency)
So the usage in the app is pretty low, in the last 30 days I am usually less than 0.04 requests per second:
Most of the work is done with one instance:
The delay of most operations is much lower than a second, but the alarming number of requests takes 10-30 times.
So, I decided that it should only be network latency, but every example of a slow operation refuted this. Datastore and the network have always been incredibly reliable. Here's the anatomy of a slow query that takes more than 30 seconds:
Here is the anatomy of a normal query:
At a high level, my code is rather uninteresting: it is a simple api that makes several network calls and saves / reads data from the cloud data storage. The entire source can be found on github here . The application runs on a single instance of the application core for automatic scaling and heats up.
CPU usage over the past month does not seem to show anything interesting:
It is strange to see that even for fast operations, a huge amount of time is spent on the processor, although the code simply creates several objects, saves them and returns JSON. I am wondering if the processor will be bound to my application kernel instance by another application, which could lead to periodic performance degradation.
My appengine.xml config looks like this:
<?xml version="1.0" encoding="utf-8"?> <appengine-web-app xmlns="http://appengine.google.com/ns/1.0"> <application>sauce-sync</application> <version>1</version> <threadsafe>true</threadsafe> <automatic-scaling> <min-idle-instances>1</min-idle-instances> </automatic-scaling> </appengine-web-app>
And my web.xml looks like this:
<web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5"> <servlet> <servlet-name>SystemServiceServlet</servlet-name> <servlet-class>com.google.api.server.spi.SystemServiceServlet</servlet-class> <init-param> <param-name>services</param-name> <param-value>com.sauce.sync.SauceSyncEndpoint</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>SystemServiceServlet</servlet-name> <url-pattern>/_ah/spi/*</url-pattern> </servlet-mapping> <!--reaper--> <servlet> <servlet-name>reapercron</servlet-name> <servlet-class>com.sauce.sync.reaper.ReaperCronServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>reapercron</servlet-name> <url-pattern>/reapercron</url-pattern> </servlet-mapping> <servlet> <servlet-name>reaper</servlet-name> <servlet-class>com.sauce.sync.reaper.ReaperServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>reaper</servlet-name> <url-pattern>/reaper</url-pattern> </servlet-mapping> <welcome-file-list> <welcome-file>index.html</welcome-file> </welcome-file-list> <filter> <filter-name>ObjectifyFilter</filter-name> <filter-class>com.googlecode.objectify.ObjectifyFilter</filter-class> </filter> <filter-mapping> <filter-name>ObjectifyFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> </web-app>
TL; DR I'm completely stuck, and I'm not sure how to debug or fix this problem, and I'm starting to think that this is a common thing for small applications in the application.
I have been thinking about disabling the resident instance for a while, hoping that my application has just launched some kind of two-tier hardware or next to an application that consumes a lot of resources. Has anyone encountered similar performance issues or are aware of additional ways to profile your application?
EDIT:
I tried to work on one resident instance, I also tried to configure 2-4 simultaneous requests for this question without any results. Logs and appstats both confirm that an excessive amount of time has been spent waiting for my initial run of my code. Here is a query that takes 25 seconds before my first line of code is launched, not sure if the application endpoint / engine is running at that moment.
Again, the load is still low and this request is on a hot instance.
EDIT 2:
It seems that for some reason, the application engine + endpoints do not play well with the min-idle-instances
. Returning to the default application kernel configuration, I fixed my problem.