Java threads | grouping the same elements

I have a stream of words, and I would like to sort them according to the appearance of the same elements (= words).

For example: {hello, world, hello}

to

Map<String, List<String>> 

hello, {hello, hello}

world, {world}

What I still have:

 Map<Object, List<String>> list = streamofWords.collect(Collectors.groupingBy(???)); 

Problem 1: the stream seems to lose information that it is processing strings, so the compiler forces me to change the type to Object, List

Problem 2: I do not know what to put inside the parent in order to group it with the same event. I know that I can handle individual elements inside a lambda expression, but I have no idea how to reach the "outside" of each element to check for equality.

thanks

+6
source share
2 answers

The key tool you are looking for is an identification function:

 Map<String, List<String>> list = streamofWords.collect(Collectors.groupingBy(Function.identity())); 

EDIT added explanation:

  • Function.identity() returns the function 'Function' in one method, which no more than returns the argument that it receives.
  • Collectors.groupingBy(Function<S, K> keyExtractor) provides a collector that collects all stream elements up to Map<K, List<S>> . It uses an implementation of keyExtractor to validate flow objects of type S and derives a key of type K from them. This key is the map key used to retrieve (or create) a list in the result map to which the stream item is added.
+6
source

To get Map<String, List<String>> , you just need to tell the groupingBy team that you want to group the values โ€‹โ€‹by identifier, so the function x -> x .

 Map<String, List<String>> occurrences = streamOfWords.collect(groupingBy(str -> str)); 

However, this is a little useless since you see that you have the same type of information twice. You should examine Map<String, Long> , where the value indicates the appearance of a string in the stream.

 Map<String, Long> occurrences = streamOfWords.collect(groupingBy(str -> str, counting())); 

Basically, instead of groupingBy returning values โ€‹โ€‹as List , you use the top-down collector counting() to let you know that you want to count the number of times that value appears.

Your sorting requirement should imply that you must have Map<Long, List<String>> (what if different lines appear the same number of times?), And by default the toMap collector returns a HashMap , it has no concept of ordering, but instead of this you can save the elements in TreeMap .


I tried to summarize what I said in the comments.

It seems you have problems with how str -> str can determine if "hello" or "world" are different.

First of all, str -> str is a function, that is, for input x gives the value f (x). For example, f(x) = x + 2 is a function that returns x + 2 for any value of x .

Here we use the identity function, i.e. f(x) = x . When you collect elements from the pipeline in Map , this function will be called earlier to get the key to the value. So, in your example, you have 3 elements for which the identification function gives:

 f("hello") = "hello" f("world") = "world" 

So far so good.

Now, when collect() is called, for each value in the stream, you apply a function to it and evaluate the result (which will be the key in the Map ). If the key already exists, we take the current mapped value and combine the value that we wanted to put into the List (that is, the value from which you just applied the function) with this previous mapped value. This is why you get Map<String, List<String>> at the end.

Take another example. Now the stream contains the values โ€‹โ€‹"hello", "world" and "hey", and the function that we want to use to group the elements is str -> str.substring(0, 2) , that is, a function that takes the first two characters String

Similarly, we have:

 f("hello") = "he" f("world") = "wo" f("hey") = "he" 

Here you see that "hello" and "hey" give the same key when applying the function, and therefore they are collected in the same List when they are collected, so the final result:

 "he" -> ["hello", "hey"] "wo" -> ["world"] 

To have an analogy with math, you could take any non-bijective function, such as x 2 . For x = -2 and x = 2 we have f(x) = 4 . Therefore, if we grouped integers by this function, -2 and 2 would be in one โ€œbagโ€.

Looking at the source code, you will not understand what happens first. This is useful if you want to know how it is implemented under the hood. But first, try to think of a concept with a higher level of abstraction, and then, perhaps, everything will become clearer.

Hope this helps! :)

+7
source

Source: https://habr.com/ru/post/986233/


All Articles