I am using java 8 counting collector to get information on the number of values.
For ex; If I have a thread stream like
Stream<String> doc1 = Stream.of("a", "b", "c", "b", "c");
Stream<String> doc2 = Stream.of("b", "c", "d");
Stream<Stream<String>> docs = Stream.of(doc1, doc2);
I can count the occurrences of each word in a document by doing
List<Map<String, Long>> collect = docs
.map(doc -> doc.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())))
.collect(Collectors.toList());
The result is a structure like
[
{a=1, b=2, c=2},
{b=1, c=1, d=1}
]
However, I would like the account to be associated with the docId that it came from. For example, I would like to have a structure like
[
{a=(randId1, 1), b=(randId1, 2), c=(randId1, 2)},
{b=(randId2, 1), c=(randId2, 1), d=(randId2, 1)}
]
where randId1they randId2can be generated at runtime (I just need to find the path to a unique source), but ()represents Pair from Apache.
I tried to wrap the doc in Pairfrom (docId, doc), but I was stuck in changing the lookupCollectors.counting()
List<Map<String, Long>> collect = docs.map(doc -> Pair.of(UUID.randomUUID(), doc))
.map(p -> p.getRight().collect(Collectors.groupingBy(Function.identity(), Collectors.counting())))
.collect(Collectors.toList());
How to get output in the right format?