Why does the processor's Java profile (using visualvm) show so many hits on a method that does nothing?

This is what I think I saw before with other profiling tools in other environments, but in this case it is especially dramatic.

I take the profile of the task processor, which runs for about 12 minutes, and it shows almost half the time spent in the method, which literally does nothing: it received an empty body. What can cause this? I do not believe that this method is called a ridiculous number of times, and, of course, do not take into account half the execution time.

For what it's worth, the method in question is called startContent (), and it was used to notify the parsing event. The event is passed along the filter chain (maybe a dozen of them), and the startContent () method for each filter does almost nothing but call startContent () for the next filter in the chain.

This is pure Java code and I am running it on a Mac.

Attached screenshot of CPU sampler output:

CPU sampler screenshot

and here is an example showing the call stack:

CPU sampler sample call stack

(After a delay due to vacation) Here are some photos showing the exit from the profiler. These numbers are much larger than what I expect when the profile looks. The profiler output seems quite significant, and the sampler output is false.

CPU profiler output 1CPU profiler output 2

As some of you will guess, the task in question is the launch of the Saxon XML schema validator (on the 9Gb input file). The profile shows about half the time spent checking the contents of an element compared to simple types (which happens during endElement processing) and about half the testing spent on key constraints for uniqueness; The two profiler views show the highlighting of activity associated with these two aspects of the task.

I cannot provide data coming from a client.

+6
source share
4 answers

I have not used VisualVM, but I suspect that the problem is most likely due to a lack of tools on such an empty method. Here is the relevant excerpt from the JProfiler documentation (which I have used widely):

If the method call record type is set to Dynamic Instrument, all methods of the profiled classes are used. This creates some overhead, which is important for methods with very short lead times. If such methods are called very often, the measured time of this method will be very high. In addition, because of the toolkit, the hotspot compiler can be prevented from optimizing them. In extreme cases, such methods become the dominant hot spots, although this is not true for a non-instrumental run. An example is the XML parser method, which reads the next character. This method returns very quickly, but can be called millions of times in a short amount of time.

Basically, the profiler adds its own "time length detection code", essentially, but in an empty method, the profiler will spend all the time on this, and not on the actual permission to run this method.

I recommend, if possible, to tell VisualVM to stop using this thread if it supports this filtering.

+7
source

It is generally believed that using the profiler is much better (for finding performance issues, not for measuring things) than - anything else, really - of course, than a simple way to randomly pause.

This assumption is only general wisdom - it has no basis in theory or practice. There are numerous scientific peer-reviewed articles on profiling, but not one of them that I read even considered this issue, not to mention its justification. This is a blind spot in academia, not a big one, but it is there.

Now to your question -

In the screenshot showing the call stack, this is what is called the hot path, which accounts for approximately 60% of the processorโ€™s time in the thread. Assuming the code with the name "saxon" in the title interests you, this is:

net.sf.saxon.event.ReceivingContentHandler.startElement net.sf.saxon.event.ProxyReceiver.startContent net.sf.saxon.event.ProxyReceiver.startContent net.sf.saxon.event.StartTagBuffer.startContent net.sf.saxon.event.ProxyReceiver.startContent com.saxonica.ee.validate.ValidationStack.startContent com.saxonica.ee.validate.AttributeValidator.startContent net.sf.saxon.event.TeeOutputter.startContent net.sf.saxon.event.ProxyReceiver.startContent net.sf.saxon.event.ProxyReceiver.startContent net.sf.saxon.event.Sink.startContent 

First of all, it seems to me that he should do I / O, or at least wait until some other process gives him the content. If so, you should look at the wall clock, not the processor time.

Secondly, the problem can be on any of those sites where the function calls one of them. If any such call is not really necessary and can be missed or completed less frequently, this will reduce the time by a significant fraction. My suspicions are drawn on StartTagBuffer and validate , but you know better.

There are other points that I could make, but these are the main ones.

ADDED after editing the question. I am inclined to suggest that you are looking for ways to optimize your code, not just ways to get numbers for yourself.

It still looks like CPU time, not wall time, because there are no I / O in the hot paths. Perhaps this is normal in your case, but what does it mean, out of your 12-minute wall clock hours, 11 minutes can be spent waiting for I / O, with 1 minute in the CPU. If so, you could cut out 30 seconds of fat in the processor part and reduce the time by 30 seconds. Therefore, I prefer the sampling of wall clocks, so I have a common perspective.

Looking only at the hot roads, you will not get a true picture. For example, if the hot path says that the function F is on the hot path, say, 40% of the time, this means that F costs at least 40%. It can be a lot more because it can be on other roads that are not so hot. Thus, you may have the juicy opportunity to speed up the process, but it does not get much impact in the specific path that the profiler decided to show you, so you do not pay much attention to it. In fact, a large timer may not appear at all, because on a particular hot way there is always something a little larger, for example new , or because it occurs by several names, for example, with class constructor templates.

It does not show you line resolution information. If you want to inspect an allegedly expensive routine for cost reasons, you need to look at the lines inside it. There is a tendency when looking at the routine to say: "He just does what he has to do." But if you look at a certain expensive line of code, which is most often a method call, you may ask: "Is it really necessary to make this call? Maybe I already have information." This is a much more specific question about what you can fix.

Could this show you some original stack samples? In my experience, they are much more informative than any resume, like a hot way that a profiler can imagine. The point is to examine the sample and come to a full understanding of what the program was doing, and the reason why at that moment. Then repeat a few more samples. You will see things that do not need to be done, which you can fix in order to achieve significant acceleration. (If the code is no longer optimal, then it will be good to know.) The fact is that you are looking for problems, not measurements. Statistically, this is very rude, but good enough , and there will be no problems.

+2
source

I assume that the Sink.startContent method is actually called a ridiculous number of times.

Your Sampling tab is displayed on your screen shot, which usually results in realistic timings if the user has a long interval. If you use the Profiler tab instead, you will also get a call counter. (You will also get less realistic timings, and your program will be very slow, but you only need to do this in a few seconds to get an idea of โ€‹โ€‹the number of calls).

Itโ€™s hard to predict what kind of optimizations and especially embedding HotSpot performs, and the sampler can only attach the time of the built-in methods to call sites. I suspect that some call code in the saxon for some reason may be attributed to your empty callback function. In this case, you simply suffer from the cost of XML, and the only option might be to switch to another parser.

+1
source

I had a lot of useful information and recommendations from this topic, for which many thanks. However, I donโ€™t think the answer to the main question was: why does the processor selection in VisualVM give an absurdly large number of calls in a method that does nothing, and this is not called more often than many other methods?

For future research, I will rely only on the profiler, not the sampler, now I understand a little how they differ.

I really didnโ€™t get much new information from the profiler about this particular task, since it pretty much confirmed what I would have guessed; but this in itself is useful. He told me that there was no magic bullet to speed up this particular process, but set limits on what could be achieved as a result of some serious reorganization, for example, a possible future improvement, which seems to have some prospects, gives rise to a byte validator code for each user, a certain simple type in the circuit.

0
source

Source: https://habr.com/ru/post/969814/


All Articles