Should I reset java.util.HashSet in favor of CompactHashSet?

I found that there is a Set implementation that uses hashes (with all the useful consequences, such as O (1) for contains() , etc.) that are claimed to be more efficient than java.util.HashSet in every aspect:

http://ontopia.wordpress.com/2009/09/23/a-faster-and-more-compact-set/

http://alias-i.com/lingpipe/docs/api/com/aliasi/util/CompactHashSet.html

Would it be nice to stop using java.util.HashSet completely, where do I need java.util.Set in favor of com.aliasi.util.CompactHashSet ?

+6
source share
3 answers

Let's start a little test game.

Tests are based on tests of the original article, but use modern tools.

 package tests; import com.carrotsearch.hppc.ObjectOpenHashSet; import com.carrotsearch.hppc.cursors.ObjectCursor; import com.google.common.collect.GuavaCompactHashSet; import net.ontopia.utils.CompactHashSet; import net.openhft.koloboke.collect.set.hash.HashObjSet; import net.openhft.koloboke.collect.set.hash.HashObjSets; import org.openjdk.jmh.annotations.*; import java.util.Arrays; import java.util.HashSet; import java.util.Set; import java.util.concurrent.TimeUnit; import java.util.function.Consumer; import static java.util.Arrays.stream; import static org.openjdk.jol.info.GraphLayout.parseInstance; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @OperationsPerInvocation(TestHashSet.TIMES) @Threads(1) @Fork(1) @State(Scope.Thread) public class TestHashSet { public static final int TIMES = 1000000; private static final int MAX = 5000000; private static long ELEMENTS_SIZE; static Long[] add = new Long[TIMES], lookup = new Long[TIMES], remove = new Long[TIMES]; static { for (int ix = 0; ix < TIMES; ix++) add[ix] = new Long(Math.round(Math.random() * MAX)); ELEMENTS_SIZE = stream(add).distinct().count() * parseInstance(add[0]).totalSize(); for (int ix = 0; ix < TIMES; ix++) lookup[ix] = new Long(Math.round(Math.random() * MAX)); for (int ix = 0; ix < TIMES; ix++) remove[ix] = new Long(Math.round(Math.random() * MAX)); } @Benchmark public int hashSet() { Set<Long> set = new HashSet<Long>(); for (Long o : add) { set.add(o); } int r = 0; for (Long o : lookup) { r ^= set.contains(o) ? 1 : 0; } for (Long o : set) { r += o.intValue(); } for (Long o : remove) { set.remove(o); } return r + set.size(); } @Benchmark public int compactHashSet() { Set<Long> set = new CompactHashSet<Long>(); for (Long o : add) { set.add(o); } int r = 0; for (Long o : lookup) { r ^= set.contains(o) ? 1 : 0; } for (Long o : set) { r += o.intValue(); } for (Long o : remove) { set.remove(o); } return r + set.size(); } @Benchmark public int hppcSet() { ObjectOpenHashSet<Long> set = new ObjectOpenHashSet<Long>(); for (Long o : add) { set.add(o); } int r = 0; for (Long o : lookup) { r ^= set.contains(o) ? 1 : 0; } for (ObjectCursor<Long> cur : set) { r += cur.value.intValue(); } for (Long o : remove) { set.remove(o); } return r + set.size(); } @Benchmark public int kolobokeSet() { Set<Long> set = HashObjSets.newMutableSet(); for (Long o : add) { set.add(o); } int r = 0; for (Long o : lookup) { r ^= set.contains(o) ? 1 : 0; } for (Long o : set) { r += o.intValue(); } for (Long o : remove) { set.remove(o); } return r + set.size(); } @Benchmark public int guavaCompactHashSet() { // fair comparison -- growing table Set<Long> set = new GuavaCompactHashSet<>(10); for (Long o : add) { set.add(o); } int r = 0; for (Long o : lookup) { r ^= set.contains(o) ? 1 : 0; } for (Long o : set) { r += o.intValue(); } for (Long o : remove) { set.remove(o); } return r + set.size(); } public static void main(String[] argv) { HashSet hashSet = new HashSet(); test("HashSet", hashSet, hashSet::add); CompactHashSet compactHashSet = new CompactHashSet(); test("CompactHashSet", compactHashSet, compactHashSet::add); HashObjSet<Object> kolobokeSet = HashObjSets.newMutableSet(); test("KolobokeSet", kolobokeSet, kolobokeSet::add); ObjectOpenHashSet hppcSet = new ObjectOpenHashSet(); test("HPPC set", hppcSet, hppcSet::add); GuavaCompactHashSet guavaCompactHashSet = new GuavaCompactHashSet(10); test("GuavaCompactHashSet", guavaCompactHashSet, guavaCompactHashSet::add); } public static void test(String name, Object set, Consumer setAdd) { for (Long o : add) { setAdd.accept(o); } System.out.printf("%s: %.1f bytes per element\n", name, ((parseInstance(set).totalSize() - ELEMENTS_SIZE) * 1.0 / TIMES)); } } 

Results:

 Set implementation Speed Memory footprint Score Units +UCOops -UseCompressedOops CompactHashSet 828 ns/op 8.4 16.8 bytes/elem HashSet 676 ns/op 37.4 60.3 bytes/elem HPPC Set 853 ns/op 10.5 18.9 bytes/elem Koloboke Set 587 ns/op 8.4 16.8 bytes/elem GuavaCompactHashSet 874 ns/op 25.9 37.4 bytes/elem 

It appears that CompactHashSet even slower than the old good HashSet , despite the fact that it uses much less memory.

+7
source

It depends.

Are you dealing with very large sets and many input or read operations? This new implementation reduced time by half in a million operations. This is a great improvement, but if you do just a few thousand operations or a dozen, it quickly turns into micro-optimization.

The tests shown also insert Long into the kit. Performance for both the runtime and the memory can change if you save something else in the set.

If you have a use case that might be useful due to a change in a statistically significant way, use it.

+3
source

Option 1: Do not care . If you look in the java HashSet implementation, you will find that it just uses the HashMap internally:

 public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable { static final long serialVersionUID = -5024744406713321676L; private transient HashMap<E,Object> map; .... 

This is a quick implementation, however, each set entry has a reference to a value that is not required. Consequently, memory consumption. My first option is β€œdon't care,” since I hope that someday in the future, someone will provide an improved HashSet in the JDK. Software engineers should always have hope and a positive attitude :)

As part of the usual programming logic, I always adhere to the most accessible standards and use what is available. This avoids the effect that each programmer uses his own "implementation of the selected set" or, even worse, conducts lengthy research on what is actually the best HashSet implementation to use;)

Does Oracle have an open big code for poor HashMap? Cannot find ...

Option 2: Care . If you do not use the value of business logic, but as part of some technical middleware code, performance can make a difference. Then there are various options. CompactHashMap in Google Guava is one. Another good library is High Performance Primitive Collections . In HPPC, you will also find kits for each primitive type. I think you will also find other things that suit your specific purpose. Not every HashMap replacement can have the same semantics as the original HashMap.

So, I personally never replaced java.util.HashMap with just the "default".

+3
source

Source: https://habr.com/ru/post/976733/


All Articles