Low latency device resistance

I noticed periodic but successive bursts of delay from my application running on the application engine. At first, I thought the network might be slow, but application statistics confirmed that it wasn’t.

I was able to reproduce the bursts of delay using the old and new versions of the SDK, currently I am using the following:

  • Application SDK: 1.9.42
  • Google Endpoints: 1.9.42
  • Objectify: 5.1.13
  • Appstats (for debugging network latency)

So the usage in the app is pretty low, in the last 30 days I am usually less than 0.04 requests per second:

requests per second

Most of the work is done with one instance: enter image description here

The delay of most operations is much lower than a second, but the alarming number of requests takes 10-30 times.

Delay density distribution

5% of requests take 23 seconds or longer ...

So, I decided that it should only be network latency, but every example of a slow operation refuted this. Datastore and the network have always been incredibly reliable. Here's the anatomy of a slow query that takes more than 30 seconds:

application statistics taking 31 seconds

Here is the anatomy of a normal query: enter image description here

At a high level, my code is rather uninteresting: it is a simple api that makes several network calls and saves / reads data from the cloud data storage. The entire source can be found on github here . The application runs on a single instance of the application core for automatic scaling and heats up.

CPU usage over the past month does not seem to show anything interesting: enter image description here

It is strange to see that even for fast operations, a huge amount of time is spent on the processor, although the code simply creates several objects, saves them and returns JSON. I am wondering if the processor will be bound to my application kernel instance by another application, which could lead to periodic performance degradation.

My appengine.xml config looks like this:

<?xml version="1.0" encoding="utf-8"?> <appengine-web-app xmlns="http://appengine.google.com/ns/1.0"> <application>sauce-sync</application> <version>1</version> <threadsafe>true</threadsafe> <automatic-scaling> <!-- always keep an instance up in order to keep startup time low--> <min-idle-instances>1</min-idle-instances> </automatic-scaling> </appengine-web-app> 

And my web.xml looks like this:

 <web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5"> <servlet> <servlet-name>SystemServiceServlet</servlet-name> <servlet-class>com.google.api.server.spi.SystemServiceServlet</servlet-class> <init-param> <param-name>services</param-name> <param-value>com.sauce.sync.SauceSyncEndpoint</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>SystemServiceServlet</servlet-name> <url-pattern>/_ah/spi/*</url-pattern> </servlet-mapping> <!--reaper--> <servlet> <servlet-name>reapercron</servlet-name> <servlet-class>com.sauce.sync.reaper.ReaperCronServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>reapercron</servlet-name> <url-pattern>/reapercron</url-pattern> </servlet-mapping> <servlet> <servlet-name>reaper</servlet-name> <servlet-class>com.sauce.sync.reaper.ReaperServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>reaper</servlet-name> <url-pattern>/reaper</url-pattern> </servlet-mapping> <welcome-file-list> <welcome-file>index.html</welcome-file> </welcome-file-list> <filter> <filter-name>ObjectifyFilter</filter-name> <filter-class>com.googlecode.objectify.ObjectifyFilter</filter-class> </filter> <filter-mapping> <filter-name>ObjectifyFilter</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> </web-app> 

TL; DR I'm completely stuck, and I'm not sure how to debug or fix this problem, and I'm starting to think that this is a common thing for small applications in the application.

I have been thinking about disabling the resident instance for a while, hoping that my application has just launched some kind of two-tier hardware or next to an application that consumes a lot of resources. Has anyone encountered similar performance issues or are aware of additional ways to profile your application?

EDIT:

I tried to work on one resident instance, I also tried to configure 2-4 simultaneous requests for this question without any results. Logs and appstats both confirm that an excessive amount of time has been spent waiting for my initial run of my code. Here is a query that takes 25 seconds before my first line of code is launched, not sure if the application endpoint / engine is running at that moment.

25 seconds before running my code

Again, the load is still low and this request is on a hot instance.

EDIT 2:

It seems that for some reason, the application engine + endpoints do not play well with the min-idle-instances . Returning to the default application kernel configuration, I fixed my problem.

enter image description here

+6
source share
1 answer

I have no answer, but I can offer you some debugging tips.

Appstats may or may not report correctly. However, log messages receive a timestamp. Record before and after each RPC operation. This should give you some idea.

30s delays sound similar to warm-up requests, which should be clearly indicated in the logs. One thing I've found in the past is that installing any resident instances for low traffic applications (non-intuitively) usually directs a lot of requests to cold instances. Use the default setting and configure the cron task for ping and endpoint once per minute.

+3
source

Source: https://habr.com/ru/post/1011556/


All Articles