Using Java 8 High Hash

I use hashmap to store QTable to implement a reinforcement learning algorithm. My hash should store 150,000,000 records. When I launched my algorithm, I saw that the memory used by the process exceeds 1000000K. When I calculated the memory, I expect it to use no more than 530000K. I tried to write an example, and I got the same great memory usage:

public static void main(String[] args) { HashMap map = new HashMap<>(16_000_000, 1); for(int i = 0; i < 15_000_000; i++){ map.put(i, i); } } 

My memory:

Each record is 32 bytes
Capacity 15000000
The instance of HashMap uses: 32 * SIZE + 4 * CAPACITY memory = (15000000 * 32 + 15000000 * 4) / 1024 = 527343.75K

Where am I mistaken in my memory calculations?

+6
source share
2 answers

Well, at best we assume a word size of 32 bits / 4 bytes (with CompressedOops and CompressedClassesPointers). Then the map record consists of two words: JVM overhead (klass pointer and label word), key, value, hashcode and the next pointer, which are only 6 words, in other words, 24 bytes. Thus, having 15,000,000 copies will consume 360 ​​MB.

In addition, the array contains entries. HashMap uses powers that have a capacity of 2, so for 15,000,000 records the size of the array is at least 16,777,216, consuming 64 megabytes.

Then you have 30,000,000 instances of Integer . The problem is that map.put(i, i) performs two boxing operations, and although it is recommended that the JVM reuse objects when boxing, it is not required, and reuse will not happen in your simple program, which may exit before how the optimizer has ever intervened.

To be precise, the first 128 Integer instances are reused because separation is mandatory for values ​​in the range -128 … +127 , but implementation does this by initializing the entire cache on first use, so for the first Iteration 128 , it does not create two instances. but the cache consists of 256 instances, which is twice as large, so again we get 30,000,000 Integer instances. An Integer instance consists of at least two special JVM words and an actual int value, which should be 12 bytes, but due to the default alignment, the actual memory consumed will be 16 bytes, divided by eight.

Thus, 30,000,000 created Integer instances consume 480 MB.

This amounts to a total of 360 MB + 64 MB + 480 MB, which is more than 900 MB, which makes the 1 GB heap size plausible.

But why do we need profiling tools. After running your program, I got

Used memory sorted by size

Please note that this tool only processes the used size of objects, i.e. 12 bytes for an Integer object, not counting the indentation that you notice when viewing the shared memory allocated by the JVM.

+6
source

I had the same requirement as you .. so I decided to drop my thoughts here.

1) There is a great tool for this: jol .

2) Arrays are also objects, and each object in java has two additional headers: mark and klass, usually 4 and 8 bytes (this can be configured using compressed pointers, but not go into details).

3) It is important to note the load factor of the card here (since it affects the resizing of the internal array). Here is an example:

 HashMap<Integer, Integer> map = new HashMap<>(16, 1); for (int i = 0; i < 13; ++i) { map.put(i, i); } System.out.println(GraphLayout.parseInstance(map).toFootprint()); HashMap<Integer, Integer> map2 = new HashMap<>(16); for (int i = 0; i < 13; ++i) { map2.put(i, i); } System.out.println(GraphLayout.parseInstance(map2).toFootprint()); 

The result of this is different (only the corresponding lines):

  1 80 80 [Ljava.util.HashMap$Node; // first case 1 144 144 [Ljava.util.HashMap$Node; // second case 

See how the size is larger for the second case, because the support array is twice as large (32 entries). You can only put 12 records in an array of size 16, since the default load factor is 0.75: 16 * 0.75 = 12.

Why 144? The math here is simple: an array is an object, thus: 8 + 4 bytes for headers. Plus 32 * 4 for links = 140 bytes. Due to memory alignment of 8 bytes, 4 bytes are added to fill, resulting in 144 bytes.

4) records are stored inside either Node or TreeNode inside the map (Node - 32 bytes, and TreeNode - 56 bytes). Since you are using ONLY integers, you will only have nodes, since there should be no hash collisions. There may be collisions, but this does not mean that a specific array record will be converted to TreeNode, there is a threshold for this. We can easily prove that there will be only nodes:

  public static void main(String[] args) { Map<Integer, List<Integer>> map = IntStream.range(0, 15_000_000).boxed() .collect(Collectors.groupingBy(WillThereBeTreeNodes::hash)); // WillThereBeTreeNodes - current class name System.out.println(map.size()); } private static int hash(Integer key) { int h = 0; return (h = key.hashCode()) ^ h >>> 16; } 

The result of this will be 15_000_000, there was no merge, so no hash collisions.

5) When you create Integer objects, there is a pool for them (from -127 to 128 - this can also be configured, but not for simplicity).

6) Integer is an object, so it has 12 bytes and 4 bytes for the actual value of int.


With this in mind, try looking at the output for 15_000_000 records (since you use a load factor of one, there is no need to create an internal capacity of 16_000_000). It will take a long time, so be patient. I also gave him

-Xmx12G and -Xms12G

 HashMap<Integer, Integer> map = new HashMap<>(15_000_000, 1); for (int i = 0; i < 15_000_000; ++i) { map.put(i, i); } System.out.println(GraphLayout.parseInstance(map).toFootprint()); 

Here is what jol said:

 java.util.HashMap@9629756d footprint: COUNT AVG SUM DESCRIPTION 1 67108880 67108880 [Ljava.util.HashMap$Node; 29999872 16 479997952 java.lang.Integer 1 48 48 java.util.HashMap 15000000 32 480000000 java.util.HashMap$Node 44999874 1027106880 (total) 

Start from the bottom.

The total size of the hashmap area is 1027106880 bytes or 1,027 MB .

A Node instance is the wrapper class in which each entry resides. It has a size of 32 bytes; there are 15 million entries, so the line:

  15000000 32 480000000 java.util.HashMap$Node 

Why 32 bytes? It stores a hash code (4 bytes), a key reference (4 bytes), a value reference (4 bytes), the following Node link (4 bytes), a 12-byte header, 4 bytes, resulting in 32 bytes.

  1 48 48 java.util.HashMap 

One hashmap instance is 48 bytes for internal elements.

If you really want to know why 48 bytes:

  System.out.println(ClassLayout.parseClass(HashMap.class).toPrintable()); java.util.HashMap object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 Set AbstractMap.keySet N/A 16 4 Collection AbstractMap.values N/A 20 4 int HashMap.size N/A 24 4 int HashMap.modCount N/A 28 4 int HashMap.threshold N/A 32 4 float HashMap.loadFactor N/A 36 4 Node[] HashMap.table N/A 40 4 Set HashMap.entrySet N/A 44 4 (loss due to the next object alignment) Instance size: 48 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 

Further Integer instances:

  29999872 16 479997952 java.lang.Integer 

30 million integer objects (minus 128 that are cached in the pool)

  1 67108880 67108880 [Ljava.util.HashMap$Node; 

we have 15_000_000 entries, but the internal HashMap array has a capacity of two sizes, which is 16777 216 links of 4 bytes each.

 16_777_216 * 4 = 67_108_864 + 12 bytes header + 4 padding = 67108880 
+3
source

Source: https://habr.com/ru/post/1270042/


All Articles