The Java performance test changes when you execute code from a callback class. Problem with Java framework?

Question

The Java performance test changes when you execute code from a callback class. Problem with Java framework?

I tried to write a fairly general testing framework for profiling a group of functions, but ran into a problem that I could not put with my finger in the detailed description below.

The concept is pretty simple. I created an abstract class Test :

 public abstract class Test { private final String name; public Test(String name) { this.name = name; } public abstract void test(); }

Then I have a master test class with configuration information and loops to run the tests.

 public class MyTestClass { public static double staticMethod1(Quat4f q) { double angle; float dirW = qw; if (qw * qw + qy * qy > 0.0f) { if (qw > 0.f && qy < 0.0f) dirW *= -1.0f; angle = 2.0f * Math.acos(dirW); } else { angle = 0.0f; } return angle / 6.283f * 100.f; } public static double staticMethod2(Quat4f q) { AxisAngle4f axisAngle = new AxisAngle4f(); axisAngle.set(q); return axisAngle.angle / 6.283f * 100.f; } public static final void main(String[] args) { final Quat4f quaternion = new Quat4f(0, 0, 0, 1); Test[] tests = new Test[] { new Test("staticMethod1") { @Override public void test() { staticMethod1(quaternion); } }, new Test("staticMethod2") { @Override public void test() { staticMethod2(quaternion); } } }; long startTime = 0; int repeat = 10; //How many times to repeat each iteration. int[] tiers = new int[] { 1000, 10000, 100000, 1000000 }; long[][][] times = new long[tests.length][tiers.length][iterations]; for (int testIndex = 0; testIndex < tests.length; testIndex++) { for (int tierIndex = 0; tierIndex < tiers.length; tierIndex++) { for (int r = 0; r < repeat; r++) { startTime = System.nanoTime(); for (int i = 0; i < tiers[tierIndex]; i++) { tests[testIndex].run(); //run the test } times[testIndex][tierIndex][r] = System.nanoTime() - startTime; //Stash the execution time in the array. } } } } }

Looking at this code, you are probably wondering why the Test.run() method calls the static method. This is because initially I just inserted a hard-coded call to a static method inside a loop. Instead of calling tests[testIndex].run() I would call staticMethod1(quaternion) explicitly there. It also meant over-duplicating this loop code for each method, which led me to create an abstract base class so that I could essentially create an array of callbacks for testing.

Observations

So, after switching to using the Test callback class instead of calling the static function explicitly, I noticed two things:

The execution time of each call has increased significantly
The lead time was not as agreed.

Static Results

Here is the output table when I called static methods explicitly (all the time in nano seconds):

 1000 10000 100000 1000000 staticMethod1 315358 424208 1451141 14495864 125334 410525 1483797 14657896 125956 412079 1445543 14702681 150837 413012 1473845 14677179 126578 412080 1419419 14415313 126268 413323 1450830 14600361 125645 411457 1437147 14504261 126889 414257 1431548 14402563 129689 420476 1476954 14548734 848417 425453 1471046 14409715 staticMethod2 1528581 1031287 3137712 2180755 228899 303540 218947 2227406 228276 303228 218324 2232071 235118 301362 218324 2174224 226411 317534 218636 2180133 226410 302918 218946 2143434 253779 302606 219879 2116689 226099 349257 219880 2108913 240717 303228 218946 2150899 250358 303228 218946 2159607

Callback results

And here is the same code, only executed from my abstract base class Test :

 1000 10000 100000 1000000 staticMethod1 360453 454686 1985445 15699447 155191 449400 1639298 15048205 152391 449089 1576165 15128134 175095 451888 1537289 15300429 156746 466816 1600734 15190645 157989 464950 1641476 15483610 157368 452198 1559681 15316290 157990 460285 1572122 15402439 157367 527773 1538222 15078995 878274 454065 1548485 15077439 staticMethod2 1519562 1101263 1674130 8842756 274616 335883 1481309 8728930 285190 339616 1471046 8842135 291721 334950 1280089 8591155 294831 347391 1339491 13402065 332152 343970 1299683 10950426 300429 326553 1252100 7778814 285190 324999 1365615 8569385 297008 341792 1284133 7734030 283324 326554 1327984 11505256

Question Revisited

The only hypothesis that I was able to formulate was that perhaps this is somehow related to the concept of the Java framework. I'm really looking for someone to give a detailed analysis of why this will happen.

What could be the result of these completely different results?

I ran the tests again from my home computer, so I thought I would return their results for accuracy.

Callback (staticMethod1, staticMethod2)

 1000 10000 100000 1000000 staticMethod1 629020 688864 3016204 40796517 348542 589891 2662401 39673949 355447 611921 3559403 39613447 416936 617511 4022701 39335929 412660 635267 4290355 38862108 409702 751996 4055583 38823967 405426 761202 4063803 38998238 410030 760545 4024016 39131407 411346 656640 3877366 38737489 1794991 723060 4139759 38286028 staticMethod2 2219818 4198617 2526272 15647236 735555 1939011 2651879 14482251 761860 445542 2480238 12990096 734569 222607 2437822 14278058 734898 264366 2323394 23561771 743118 220633 2739672 15669266 746734 224909 5159080 12916113 781918 223593 2342794 14843616 789481 229512 2740658 13400784 865108 227210 5202155 22015033

Callback (staticMethod2, staticMethod1)

 1000 10000 100000 1000000 staticMethod2 2159974 1598690 4343951 4011522 755284 484013 4646131 3989491 779945 460667 390302 4114111 866752 469874 413318 3833963 911141 495193 433376 4024016 878918 468230 424827 4162118 892070 452447 431074 3830346 806579 419894 463298 4003301 814142 424169 424826 3961871 830253 417593 432718 3823112 staticMethod1 768437 632637 4596480 38401771 421539 655325 3811603 37536663 418579 657626 3917481 37377517 425813 648091 3887230 37347924 423512 653023 3800095 38334692 428772 570820 3810288 37640568 435020 581013 3795162 37543896 426800 578382 3805027 37390670 448830 567861 4004617 37502466 1883443 663874 3848101 38096961

Callback (staticMethod1, staticMethod2) with a single instance of AxisAngle4f

 1000 10000 100000 1000000 staticMethod1 693138 745420 4382752 26091003 405098 677355 3378227 41866476 390630 669793 4349213 42472807 430088 699057 4296931 27899147 385697 675711 4300549 42643790 382410 658941 4296603 32330563 393918 662888 2602557 42622417 380437 666833 2588747 32903026 393918 738515 2616367 26079823 1805843 679985 2570004 42191343 staticMethod2 444556 1640449 963422 8620168 463298 464942 946325 8545856 431732 474478 877931 8645487 452776 466915 870698 8761229 432718 449487 882534 8572490 443898 464613 876288 8482066 414633 538596 871684 8672121 408715 190054 876287 8626744 405427 96342 874643 8607016 436664 96343 847681 8543883

+4

java performance-testing stackframe

crush Aug 30 '13 at 19:37

source share

1 answer

yshavit · Accepted Answer · 2013-08-30T20:08:11+0000

These results are not very "dramatic." Average values for 1,000,000 runs:

staticMethod1
- Explicit: 14541456.7 ( 14.5 ns per iteration)
- callback: 15272563.3 ( 15.2 ns per iteration)
- slowdown: 0.7 ns per iteration
staticMethod2
- Explicit: 2167413.1 ( 2.2 ns per iteration)
- callback: 9494495.2 ( 9.5 ns per iteration)
- slowdown: ( 7 ns per iteration )

The fact that the slowdown between them turns off (almost exactly) by an order of magnitude is strange, but most importantly, the callback approach has slowed down by no more than 7 ns. It's not that much.

UPDATE

The other results you posted, with the testing order of method2 , then method1 , correspond to the following:

The JVM initially runs your code in interpreted mode, profiling it;
he sees that you always call the same nest of the Test class;
therefore, it compiles this call site into a monomorphic call site;
the call goes to another class. The monomorphic statement fails, resulting in an immediate recompilation of the megamorphic call to the site. There is no profiling in the JIT-compiled code, so there will be no repetition of this call site to find out that it can again be recompiled into a monomorphic site specialized for the new case.

The advantage of the speed of a monomorphic site is to avoid searching for a table of virtual functions and to enable the inlining method. Both of your methods are fairly short and good candidates for inlining.

You can also make sure that such recompilation occurs once, but only once, using --XX:+PrintCompilation in combination with println at the moment when you switch from one test to another.

Additional notes

You should avoid having your timings affected by such subtleties of the JVM. To do this, use the test(int iterations) method and push the innermost loop on it. This will allow sending the critical method only once at least 1000 iterations and becomes insignificant;
Distrust in measuring time less than 1 ms. System.nanoTime accuracy is nowhere near 1 ns: its granularity is usually about 1 μs, and to get good accuracy, you have to be much higher.

The Java performance test changes when you execute code from a callback class. Problem with Java framework?

Observations

Static Results

Callback results

Question Revisited

Callback (staticMethod1, staticMethod2)

Callback (staticMethod2, staticMethod1)

UPDATE

Additional notes

More articles: