To get Map<String, List<String>> , you just need to tell the groupingBy team that you want to group the values โโby identifier, so the function x -> x .
Map<String, List<String>> occurrences = streamOfWords.collect(groupingBy(str -> str));
However, this is a little useless since you see that you have the same type of information twice. You should examine Map<String, Long> , where the value indicates the appearance of a string in the stream.
Map<String, Long> occurrences = streamOfWords.collect(groupingBy(str -> str, counting()));
Basically, instead of groupingBy returning values โโas List , you use the top-down collector counting() to let you know that you want to count the number of times that value appears.
Your sorting requirement should imply that you must have Map<Long, List<String>> (what if different lines appear the same number of times?), And by default the toMap collector returns a HashMap , it has no concept of ordering, but instead of this you can save the elements in TreeMap .
I tried to summarize what I said in the comments.
It seems you have problems with how str -> str can determine if "hello" or "world" are different.
First of all, str -> str is a function, that is, for input x gives the value f (x). For example, f(x) = x + 2 is a function that returns x + 2 for any value of x .
Here we use the identity function, i.e. f(x) = x . When you collect elements from the pipeline in Map , this function will be called earlier to get the key to the value. So, in your example, you have 3 elements for which the identification function gives:
f("hello") = "hello" f("world") = "world"
So far so good.
Now, when collect() is called, for each value in the stream, you apply a function to it and evaluate the result (which will be the key in the Map ). If the key already exists, we take the current mapped value and combine the value that we wanted to put into the List (that is, the value from which you just applied the function) with this previous mapped value. This is why you get Map<String, List<String>> at the end.
Take another example. Now the stream contains the values โโ"hello", "world" and "hey", and the function that we want to use to group the elements is str -> str.substring(0, 2) , that is, a function that takes the first two characters String
Similarly, we have:
f("hello") = "he" f("world") = "wo" f("hey") = "he"
Here you see that "hello" and "hey" give the same key when applying the function, and therefore they are collected in the same List when they are collected, so the final result:
"he" -> ["hello", "hey"] "wo" -> ["world"]
To have an analogy with math, you could take any non-bijective function, such as x 2 . For x = -2 and x = 2 we have f(x) = 4 . Therefore, if we grouped integers by this function, -2 and 2 would be in one โbagโ.
Looking at the source code, you will not understand what happens first. This is useful if you want to know how it is implemented under the hood. But first, try to think of a concept with a higher level of abstraction, and then, perhaps, everything will become clearer.
Hope this helps! :)