What can I do in Java code to optimize processor caching?

When writing a Java program, can I influence how the processor uses its cache to store my data? For example, if I have an array that is accessed a lot, does it help if it is small enough to fit in one cache line (usually 128 bytes on a 64-bit machine)? What if I save a lot of used object within this limit, can I expect that the memory used by it will be close and remain in the cache?

Background: I am creating a compressed digital tree that is heavily inspired by Judy arrays that are in C. While I mainly use node compression methods, Judy has optimized the CPU cache as a central design target and node types, as well as heuristics for switching between them depends heavily on this. I was wondering if I have a chance to get these benefits?

Change The general advice of answers so far is not to try to micro-optimize machine-level details when you are so far from the machine, as in Java. I completely agree, so I felt that I should add some (hopefully) clarifying comments in order to better explain why I think this question still makes sense. They are listed below:

There are a few things that are usually easier to process computers because of how they are created. I saw that Java code is much faster when compressing data (from memory), although decompression had to use additional CPU cycles. If the data was saved on disk, it is obvious why this is so, but, of course, in RAM the same principle.

Now, computer science has a lot of possibilities to say about what it is, for example, link locality is great for C, and I suppose it's still great in Java, perhaps even more so if it helps optimize runtime, to do more smart things. But how you do it may be quite different. In C, I can write code that manages large chunks of memory and uses contiguous pointers for related data.

In Java, I can't (and don't want to) know much about how memory will be managed by a particular runtime. Therefore, I have to take optimization to a higher level of abstraction. My question is mainly, how do I do this? As for link locality, what does it mean "close together" at the level of abstraction I'm working on in Java? The same object? The same type? The same array?

In general, I do not think that layers of abstraction change the "laws of physics", metaphorically. Doubling your array in size every time you finish space is a good strategy in Java, even if you no longer call malloc() .

+41
java optimization caching
Sep 25 '09 at 16:26
source share
5 answers

The key to good Java performance is writing idiomatic code, not trying to outwit the JIT compiler. If you write your code to try to influence it, to do something in a certain way at the level of your own instruction, you are more likely to shoot in the leg.

This does not mean that general principles, such as locality of links, do not matter. They do this, but I would consider using arrays, etc., to be sure of performance, idiomatic code, but not "complicated."

HotSpot and other run-time optimizers are extremely smart in how they optimize code for specific processors. (For example, check out this discussion. ) If I were an expert on machine programming, I would write a machine language, not Java. And if this is not so, it would be unreasonable to think that I could better optimize my code than experts.

Also, even if you know how best to implement something for a particular processor, the beauty of Java is a one-time write. Smart tricks for “optimizing” Java code tend to make optimization options for JIT more stringent. Direct code that adheres to common idioms is easier to recognize by the optimizer. Thus, even if you get the best Java code for your testbed, this code may suffer greatly from a different architecture or, at best, not take advantage of the improvements in future JITs.

If you need good performance, keep it simple. Teams of really smart people are working to make it fast.

+12
Sep 25 '09 at 18:02
source share

If the data you crunch is mostly or entirely composed of primitives (for example, in numerical problems), I would recommend the following.

Select a flat structure of fixed-size primitive arrays during initialization and make sure that the data in it is periodically compressed / defragmented (0-> n, where n is the smallest maximum index if you can count the number of elements) for reuse using the for loop. This is the only way to guarantee continuous selection in Java, and compaction further improves link locality. Compaction is beneficial because it reduces the need to iterate over unused elements, reducing the number of conditional numbers: as the iteration loop ends, completion ends earlier, and less iteration = less movement through the heap = less chance of going through the cache. While compaction creates its own overhead in itself, this can only be done periodically (in relation to your main processing areas) if you want to.

Even better, you can alternate values ​​in these pre-allocated arrays. For example, if you represent spatial transformations for many thousands of objects in 2D space and process the equations of motion for each of them, you may have a narrow loop like

 int axIdx, ayIdx, vxIdx, vyIdx, xIdx, yIdx; //Acceleration, velocity, and displacement for each //of x and y totals 6 elements per entity. for (axIdx = 0; axIdx < array.length; axIdx += 6) { ayIdx = axIdx+1; vxIdx = axIdx+2; vyIdx = axIdx+3; xIdx = axIdx+4; yIdx = axIdx+5; //velocity1 = velocity0 + acceleration array[vxIdx] += array[axIdx]; array[vyIdx] += array[ayIdx]; //displacement1 = displacement0 + velocity array[xIdx] += array[vxIdx]; array[yIdx] += array[vxIdx]; } 

This example ignores problems such as rendering these objects using the associated (x, y) ... rendering, always requiring no primitives (thus, references / pointers). If you need such instances of objects, you can no longer guarantee the locality of links, and you will probably jump all over the heap. Therefore, if you can divide your code into sections where you conduct primitive-intensive processing, as shown above, this approach will help you a lot. For games, at least AI, dynamic terrain, and physics can be some of the most active aspects of the processor and are numerical, so this approach can be very useful.

+9
Aug 21 2018-12-12T00:
source share

If you do not understand what improvement has a few percent, use C, where you will get an improvement of 50-100%!

If you think that the ease of use of Java makes it the best language to use, then don't doubt the dubious optimization.

The good news is that Java will do many things under the covers to improve your code at runtime, but will almost certainly not do the optimizations you are talking about.

If you decide to go with Java, just write your code as clearly as you can, do not consider minor optimizations. (The main ones, for example, using the right collections for the right job, rather than highlighting / freeing objects inside a loop, etc., are still worth it)

+6
Sep 25 '09 at 17:29
source share

As far as I know: None. You have to write a lot in machine code to get this level of optimization. With the assembly, you back down because you no longer control where things are stored. With the compiler, you are two steps away because you do not even control the details of the generated code. With Java, you are three steps away because the JVM interprets your code on the fly.

I don't know of any Java constructs that allow you to control things at this level of detail. Theoretically, you can indirectly influence it by how you organize your program and data, but you are so far away that I don’t see how you could do it reliably or even know whether it is happening or not.

+3
Sep 25 '09 at 17:10
source share

Until now, the advice is quite strong, in general, it is better not to try to outwit JIT. But, as you say, some knowledge of the details is sometimes useful.

As for the memory layout for objects, Sun Jvm (now Oracle) puts objects into memory by type (i.e., first doubles and long, then ints and floats, then shorts and characters after these bytes and boolean and, finally, references to objects ) You can get more details here .

Local variables are usually stored on the stack (these are links and primitive types).

As Nick mentions, the best way to provide a memory layout in Java is to use primitive arrays. This way you can ensure that the data is continuous in memory. Be careful about the size of the arrays, although the GC has problems with large arrays. It also has the disadvantage that you must manage the memory yourself.

At the top, you can use the Flyweight template to get object-oriented usability while maintaining high performance.

If you need extra performance in productivity, generating your own bytecode on the fly helps with some problems if the generated code is executed enough times and your own embedded code cache is not populated (which disables JIT for all practical purposes).

+3
Jun 03 '13 at 22:22
source share



All Articles