I had the same requirement as you .. so I decided to drop my thoughts here.
1) There is a great tool for this: jol .
2) Arrays are also objects, and each object in java has two additional headers: mark and klass, usually 4 and 8 bytes (this can be configured using compressed pointers, but not go into details).
3) It is important to note the load factor of the card here (since it affects the resizing of the internal array). Here is an example:
HashMap<Integer, Integer> map = new HashMap<>(16, 1); for (int i = 0; i < 13; ++i) { map.put(i, i); } System.out.println(GraphLayout.parseInstance(map).toFootprint()); HashMap<Integer, Integer> map2 = new HashMap<>(16); for (int i = 0; i < 13; ++i) { map2.put(i, i); } System.out.println(GraphLayout.parseInstance(map2).toFootprint());
The result of this is different (only the corresponding lines):
1 80 80 [Ljava.util.HashMap$Node; // first case 1 144 144 [Ljava.util.HashMap$Node; // second case
See how the size is larger for the second case, because the support array is twice as large (32 entries). You can only put 12 records in an array of size 16, since the default load factor is 0.75: 16 * 0.75 = 12.
Why 144? The math here is simple: an array is an object, thus: 8 + 4 bytes for headers. Plus 32 * 4 for links = 140 bytes. Due to memory alignment of 8 bytes, 4 bytes are added to fill, resulting in 144 bytes.
4) records are stored inside either Node or TreeNode inside the map (Node - 32 bytes, and TreeNode - 56 bytes). Since you are using ONLY integers, you will only have nodes, since there should be no hash collisions. There may be collisions, but this does not mean that a specific array record will be converted to TreeNode, there is a threshold for this. We can easily prove that there will be only nodes:
public static void main(String[] args) { Map<Integer, List<Integer>> map = IntStream.range(0, 15_000_000).boxed() .collect(Collectors.groupingBy(WillThereBeTreeNodes::hash));
The result of this will be 15_000_000, there was no merge, so no hash collisions.
5) When you create Integer objects, there is a pool for them (from -127 to 128 - this can also be configured, but not for simplicity).
6) Integer is an object, so it has 12 bytes and 4 bytes for the actual value of int.
With this in mind, try looking at the output for 15_000_000 records (since you use a load factor of one, there is no need to create an internal capacity of 16_000_000). It will take a long time, so be patient. I also gave him
-Xmx12G and -Xms12G
HashMap<Integer, Integer> map = new HashMap<>(15_000_000, 1); for (int i = 0; i < 15_000_000; ++i) { map.put(i, i); } System.out.println(GraphLayout.parseInstance(map).toFootprint());
Here is what jol said:
java.util.HashMap@9629756d footprint: COUNT AVG SUM DESCRIPTION 1 67108880 67108880 [Ljava.util.HashMap$Node; 29999872 16 479997952 java.lang.Integer 1 48 48 java.util.HashMap 15000000 32 480000000 java.util.HashMap$Node 44999874 1027106880 (total)
Start from the bottom.
The total size of the hashmap area is 1027106880 bytes or 1,027 MB .
A Node instance is the wrapper class in which each entry resides. It has a size of 32 bytes; there are 15 million entries, so the line:
15000000 32 480000000 java.util.HashMap$Node
Why 32 bytes? It stores a hash code (4 bytes), a key reference (4 bytes), a value reference (4 bytes), the following Node link (4 bytes), a 12-byte header, 4 bytes, resulting in 32 bytes.
1 48 48 java.util.HashMap
One hashmap instance is 48 bytes for internal elements.
If you really want to know why 48 bytes:
System.out.println(ClassLayout.parseClass(HashMap.class).toPrintable()); java.util.HashMap object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 Set AbstractMap.keySet N/A 16 4 Collection AbstractMap.values N/A 20 4 int HashMap.size N/A 24 4 int HashMap.modCount N/A 28 4 int HashMap.threshold N/A 32 4 float HashMap.loadFactor N/A 36 4 Node[] HashMap.table N/A 40 4 Set HashMap.entrySet N/A 44 4 (loss due to the next object alignment) Instance size: 48 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Further Integer instances:
29999872 16 479997952 java.lang.Integer
30 million integer objects (minus 128 that are cached in the pool)
1 67108880 67108880 [Ljava.util.HashMap$Node;
we have 15_000_000 entries, but the internal HashMap array has a capacity of two sizes, which is 16777 216 links of 4 bytes each.
16_777_216 * 4 = 67_108_864 + 12 bytes header + 4 padding = 67108880