The most efficient way to count cases?

I have an array of bytes ( primitive ), they can have random values. I am trying to count them in an array in the most efficient / fastest way. I am currently using:

HashMap<Byte, Integer> dataCount = new HashMap<>(); for (byte b : data) dataCount.put(b, dataCount.getOrDefault(b, 0) + 1); 

This one-line line takes ~ 500 ms to process byte [] with a length of 24883200 . Using a regular loop for a loop takes at least 600 ms.

I was thinking of creating a set (since it contains only one of each element) and then adding it to the HashMap using Collections.frequency (), but the methods for constructing a set from primitives require several other calls, so I assume it is not so fast .

What would be the fastest way to count the events of each element?

I am using Java 8 and I would prefer to avoid using Apache Commons if possible.

+6
source share
2 answers

If it's just bytes, use an array, don't use a map. You need to use masking to deal with byte signing, but this is not very important.

 int[] counts = new int[256]; for (byte b : data) { counts[b & 0xFF]++; } 
Arrays are so compact and efficient that it is almost impossible to beat them when you can use them.
+15
source

I would create an array instead of a HashMap , given that you know exactly how many counts you need to track:

 int[] counts = new int[256]; for (byte b : data) { counts[b & 0xff]++; } 

In this way:

  • You never need to do any boxes with keys or values.
  • Nothing to accept hash code, check equality, etc.
  • It is about as efficient as memory

Note that & 0xff used to get a value in the range [0, 255] instead of [-128, 127] , so it works like an index in an array.

+8
source

Source: https://habr.com/ru/post/986684/


All Articles