The frequency of words in a list of line strings

I have a list of lines:

List<String> terms = ["Coding is great", "Search Engines are great", "Google is a nice search engine"] 

How to get the frequency of each word in the list: EXAMPLE {Coding:1, Search:2, Engines:1, engine:1, ....}

Here is my code:

  Map<String, Integer> wordFreqMap = new HashMap<>(); for (String contextTerm : term.getContexTerms() ) { String[] wordsArr = contextTerm.split(" "); for (String word : wordsArr) { Integer freq = wordFreqMap.get(word); //this line is getting reset every time I goto a new COntexTerm freq = (freq == null) ? 1: ++freq; wordFreqMap.put(word, freq); } } 
+5
source share
3 answers

Idiomatic solution with Java 8 threads:

 import java.util.Arrays; import java.util.List; import java.util.Map; import java.util.stream.Collectors; public class SplitWordCount { public static void main(String[] args) { List<String> terms = Arrays.asList( "Coding is great", "Search Engines are great", "Google is a nice search engine"); Map<String, Integer> result = terms.parallelStream(). flatMap(s -> Arrays.asList(s.split(" ")).stream()). collect(Collectors.toConcurrentMap( w -> w.toLowerCase(), w -> 1, Integer::sum)); System.out.println(result); } } 

Note that you may have to consider whether upper / lower case strings should play. This option returns the strings to lowercase and uses them as keys for the final card. Result:

 {coding=1, a=1, search=2, are=1, engine=1, engines=1, is=2, google=1, great=2, nice=1} 
+9
source
 public static void main(String[] args) { String msg="Coding is great search Engines are great Google is a nice search engine"; ArrayList<String> list2 = new ArrayList<>(); Map map = new HashMap(); list2.addAll((List)Arrays.asList(msg.split(" "))); String n[]=msg.split(" "); int f=0; for(int i=0;i<n.length;i++){ f=Collections.frequency(list2, n[i]); map.put(n[i],f); } System.out.println("values are "+map); } 
+1
source

Since the answer with Java 8, being good, does not show you how to parallelize it in Java 7 (and by default it is the same as stream ), here is an example:

  public static void main(final String[] args) throws InterruptedException { final ExecutorService service = Executors.newFixedThreadPool(10); final List<String> terms = Arrays.asList("Coding is great", "Search Engines are great", "Google is a nice search engine"); final List<Callable<String[]>> callables = new ArrayList<>(terms.size()); for (final String term : terms) { callables.add(new Callable<String[]>() { @Override public String[] call() throws Exception { System.out.println("splitting word: " + term); return term.split(" "); } }); } final ConcurrentMap<String, AtomicInteger> counter = new ConcurrentHashMap<>(); final List<Callable<Void>> callables2 = new ArrayList<>(terms.size()); for (final Future<String[]> future : service.invokeAll(callables)) { callables2.add(new Callable<Void>() { @Override public Void call() throws Exception { System.out.println("counting word"); // invokeAll implies that the future finished it work for (String word : future.get()) { String lc = word.toLowerCase(); // here it get tricky. Two thread might add the same word. AtomicInteger actual = counter.get(lc); if (null == actual) { final AtomicInteger nv = new AtomicInteger(); actual = counter.putIfAbsent(lc, nv); if (null == actual) { actual = nv; // nv got added. } } actual.incrementAndGet(); } return null; } }); } service.invokeAll(callables2); service.shutdown(); System.out.println(counter); } 

Yes, Java 8 makes it easy!

No, I tested it, but I don’t know if it is better than simple loops, or if it is perfectly thread safe.

(And seeing how you define your list is not encoded in Groovy? There is support for parallelism in Groovy).

0
source

Source: https://habr.com/ru/post/1200927/


All Articles