Multithreading - counting the total number of words from multiple files

I made a program for counting words from separate files, but how can I change my program, so it gives the total number of words from all files (as the ONE value).

My code is as follows:

public class WordCount implements Runnable { public WordCount(String filename) { this.filename = filename; } public void run() { int count = 0; try { Scanner in = new Scanner(new File(filename)); while (in.hasNext()) { in.next(); count++; } System.out.println(filename + ": " + count); } catch (FileNotFoundException e) { System.out.println(filename + " blev ikke fundet."); } } private String filename; } 

With main class:

 public class Main { public static void main(String args[]) { for (String filename : args) { Runnable tester = new WordCount(filename); Thread t = new Thread(tester); t.start(); } } } 

And how to avoid race conditions? Thank you for your help.

+4
source share
7 answers

Work Stream:

 class WordCount extends Thread { int count; @Override public void run() { count = 0; /* Count the words... */ ... ++count; ... } } 

And a class to use them:

 class Main { public static void main(String args[]) throws InterruptedException { WordCount[] counters = new WordCount[args.length]; for (int idx = 0; idx < args.length; ++idx) { counters[idx] = new WordCount(args[idx]); counters[idx].start(); } int total = 0; for (WordCount counter : counters) { counter.join(); total += counter.count; } System.out.println("Total: " + total); } } 

Many hard drives do not read several files very well at the same time. Link locality has a big impact on performance.

+3
source

You can use Future to get the account number, and at the end add all the counts or use a static variable and increase this with synchronized , i.e. use explicitly synchronized or use Atomic increment

+1
source

What if your Runnable took two arguments:

  • a BlockingQueue<String> or BlockingQueue<File> input files
  • Atomiclong

In the loop, you will get the next String / File from the queue, count its words and increase AtomicLong by this amount. Regardless of whether the while(!queue.isEmpty()) or while(!queue.isEmpty()) while(!done) depends on how you upload the files to the queue: if you know all the files from the very beginning, you can use the isEmpty version, but if you pass them from somewhere, you want to use the !done version (and have done be volatile boolean or AtomicBoolean for memory visibility).

Then you pass these Runnable performers, and you should be good to go.

+1
source

You can do count volatile and static so that all threads can increase it.

 public class WordCount implements Runnable { private static AtomicInteger count = new AtomicInteger(0); // <-- now all threads increment the same count private String filename; public WordCount(String filename) { this.filename = filename; } public static int getCount() { return count.get(); } public void run() { try { Scanner in = new Scanner(new File(filename)); while (in.hasNext()) { in.next(); count.incrementAndGet(); } System.out.println(filename + ": " + count); } catch (FileNotFoundException e) { System.out.println(filename + " blev ikke fundet."); } } } 

Update: Java could not be made for some time, but the question of making it a private static field is still worth it ... just make it AtomicInteger .

+1
source

You can create some kind of listener to get feedback from the stream.

  public interface ResultListener { public synchronized void result(int words); } private String filename; private ResultListener listener; public void run() { int count = 0; try { Scanner in = new Scanner(new File(filename)); while (in.hasNext()) { in.next(); count++; } listener.result(count); } catch (FileNotFoundException e) { System.out.println(filename + " blev ikke fundet."); } } } 

You can add the contructor parameter for the listener as well as for your file name.

  public class Main { private static int totalCount = 0; private static ResultListener listener = new ResultListener(){ public synchronized void result(int words){ totalCount += words; } } public static void main(String args[]) { for (String filename : args) { Runnable tester = new WordCount(filename, listener); Thread t = new Thread(tester); t.start(); } } } 
+1
source

You can create a thread pool with a synchronized task queue in which all the files for which you want to count words will be stored.

When your thread pool employees connect to the network, they can request a task queue for the file to be counted. After the employee completes their work, they can notify the main thread of their final number.

The main thread will have a synchronous notification method that composes all the results of the worker threads.

Hope this helps.

0
source

Or you can have all threads updating one word count variable. count ++ is atomic if count is one-way (int should be sufficient).

EDIT: It turns out the Java specs are stupid enough that count ++ is not atomic. I have no idea why. Anyway, take a look at AtomicInteger and its incrementAndGet method. Hope this is atomic (I don't know what to expect now ...) and you don't need any other synchronization mechanisms - just save your account in AtomicInteger.

0
source

Source: https://habr.com/ru/post/1385433/


All Articles