You recompile all regular expressions on every line and on every word. Instead of .flatMap(line -> Arrays.stream(line.split("\\s+"))) write .flatMap(Pattern.compile("\\s+")::splitAsStream) . Same for .filter(word -> word.matches("\\w+")) : use .filter(Pattern.compile("^\\w+$").asPredicate()) . The same goes for map .
It might be better to change .map(s -> s.toLowerCase()) and .filter(s -> s.length() >= 2) so as not to call toLowerCase() for single-letter words.
You cannot use Collectors.toConcurrentMap(w -> w, w -> 1, Integer::sum) . Firstly, your thread is not parallel, so you can easily replace toConcurrentMap with toMap . Secondly, it would probably be more efficient (although testing is necessary) to use Collectors.groupingBy(w -> w, Collectors.summingInt(w -> 1)) , as this would reduce the box (but add a finisher step that will enter all values immediately).
Instead of (e1, e2) -> Integer.compare(e2.getValue(), e1.getValue()) you can use a ready-made comparator: Map.Entry.comparingByValue() (although this is probably a matter of taste).
Summarizing:
Map<String, Integer> wc = Files.lines(Paths.get("/tmp", "/war-and-peace.txt")) .map(Pattern.compile("\\p{Punct}")::matcher) .map(matcher -> matcher.replaceAll("")) .flatMap(Pattern.compile("\\s+")::splitAsStream) .filter(Pattern.compile("^\\w+$").asPredicate()) .filter(s -> s.length() >= 2) .map(s -> s.toLowerCase()) .collect(Collectors.groupingBy(w -> w, Collectors.summingInt(w -> 1))); wc.entrySet() .stream() .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())) .limit(5) .forEach(e -> System.out.println(e.getKey() + ": " + e.getValue()));
If you don't like method references (some people don't), you can store precompiled regular expressions in variables instead.
source share