Using the Java 8 Streams API, can I sort () using the Collectors.toSet () method?

This is an implementation of the java.util.stream.Collectors class toSet() method:

 public static <T> Collector<T, ?, Set<T>> toSet() { return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add, (left, right) -> { left.addAll(right); return left; }, CH_UNORDERED_ID); } 

As we can see, it uses a HashSet and calls add . From the HashSet documentation : "It does not give any guarantees regarding the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time."

In the following code, a List of String is passed, sorted, and assembled in Set :

 public static void main(String[] args) { Set<String> strings = Arrays.asList("c", "a", "b") .stream() .sorted() .collect(Collectors.toSet()); System.out.println(strings.getClass()); System.out.println(strings); } 

This provides a conclusion:

class java.util.HashSet

[a, b, c]

The result is sorted. I think what is happening here is that although the contract provided by the HashSet documentation indicates that the order is not what it provides, the implementation takes place to add to the order. I believe that this may change in future versions / vary between the JVM and that a more reasonable approach would be to do something like Collectors.toCollection(TreeSet::new) .

Is it possible to use sorted() when calling Collectors.toSet() ?

In addition, what does it mean "does not guarantee that order will remain constant over time" does it mean? (I suppose add , remove , resizing the underlying array?)

+5
source share
2 answers

The answer is no. Once you have added items to the set, you cannot rely on any order. From the JDK source code (HashSet.java):

 /** * Returns an iterator over the elements in this set. The elements * are returned in no particular order. * * @return an Iterator over the elements in this set * @see ConcurrentModificationException */ public Iterator<E> iterator() { return map.keySet().iterator(); } 

Now, in previous versions of the JDK, even if the order was not guaranteed, you usually get the elements in the same insertion order (unless the class of objects implements hashCode() , and then you get the order dictated by hashCode() ). either the order in which objects are created, or the order in which hashCode() to objects. As @Holgar mentions in the comments below, in HotSpot this is the last. And you can’t even count on it, as there are exceptions to this, since the serial number is not the only ingredient in the hashCode generator.

I recently heard a conversation from Stuart Marks (the guy responsible for rewriting most of the collections in Java 9), and he said that they added randomization to the iteration sets (created by the new collection factories) in Java 9. If you want to hear the session, part , which he talks about sets, launches here - a good conversation, highly recommended by the way!

That way, even if you were counting on iteration order of sets, as soon as you upgrade to Java 9, you should stop doing this.

All that said, if you need an order, you should consider using a SortedSet , LinkedHashSet or TreeSet

+7
source

To answer this question, you need to know a little about how the HashSet implemented. As the name implies, a HashSet is implemented using a hash table . Basically, a hash table is an array that is indexed by hashes of elements. A hash function (in Java, the object hash is computed by object.hashCode() ) - this is basically a function that satisfies several criteria:

  • it (relatively) quickly calculates for a given element
  • two objects that .equals() have the same hashes
  • there is a low probability that different elements have the same hash

So, when you met a HashSet that was "sorted" (which is understood as "an iterator preserves the natural order of the elements"), this is due to several coincidences:

  • the natural order of the elements corresponds to the natural order of their hashCode s
  • the hash table is small enough to not have collisions (two elements with the same hash code)

If you look at the String class hashCode() method, you will see that for single-letter strings the hash code corresponds to the Unicode index (code point) of the message - therefore, in this particular case, since the hash table is quite small, the elements will be sorted. However, this is a huge coincidence and

  • will not be executed for any other sort order
  • will not execute for classes whose hash codes do not match their natural order
  • will not contain hashtables with conflicts

and besides, it has nothing to do with sorted() being called into the stream - this is simply due to the implementation of hashCode() and, therefore, the hash table ordering. Therefore, the simple answer to the no.

+7
source

Source: https://habr.com/ru/post/1272788/


All Articles