Using ExecutorService for parallel job processing

I am writing a java program that needs to handle many URLs.
The following tasks will be performed at each URL. ORDER: upload, analyze, compress

Instead of having one thread to complete all tasks for each URL, I want each task to have a fixed number of threads, so that all tasks will have threads running simultaneously at any given time.

For example, a load job will have multiple threads for retrieving and loading URLs, as soon as one of the URLs is loaded, it will transfer it to the stream in the analytic task and, as soon as it is completed, it will go to the stream in the compression task etc.

I am thinking about using CompletionService in java, since it returns the result right after it is completed, but I'm not sure how it works while my code looks like this:

ExecutorService executor = Executors.newFixedThreadPool(3); CompletionService<DownloadedItem> completionService = new ExecutorCompletionService<DownloadedItem>(executor); //while list has URL do { executor.submit(new DownloadJob(list.getNextURL());//submit to queue for download //} //while there is URL left do { Future<DownloadedItem> downloadedItem = executor.take();//take the result as soon as it finish //what to do here?? //} 

My question is: how to transfer the loaded element to the analysis task and do the work there, without waiting for the completion of all load tasks? I am thinking of creating a CompletionService for every job, is this a viable method? If not, is there a better alternative way to solve this? Give examples.

+4
source share
3 answers

Once you check IN ORDER , any attempt to use separate threads for tasks in order will only complicate the design of your system.

In my opinion, your best snapshot is for separate threads to process separate URLs at the same time. To take 3 steps, you can introduce another abstraction (for example, use 3 calls), but you still want to execute them sequentially in the same thread. And there is no need for completion service.

+3
source

You are pretty close. First submit your tasks to the CompletionService :

 completionService.submit(new DownloadJob(list.getNextURL()); 

Now take Future and wait for it:

 DownloadedItem> downloadedItem = executor.take().get(); 

The get() call may be blocked. Repeat the line above as many times as you sent.


If you need a lot more bandwidth (in your case no more than three URLs will load, consider async-http-client which will allow you to download from literally thousands of URLs at the same time. It uses NIO and is event driven, stream is not involved.

+1
source

What you are describing is called Pipeline . Basically, the output of the load task is the input of the analysis task. The result of the analysis is input compression. There seem to be two options for this:

1) Let the load task know about the output pipeline so that it can present the results.

 class DownloadTask implement Runnable { Executor analyzePipeline; public void run() { //Do download stuff analyzePipeline.submit(new AnalyzeTask(downloaded content)); } } 

2) Allow another thread to transfer the results of loading tasks to the pipeline for the analysis task.

 ExecutorService executor = Executors.newFixedThreadPool(3); ExecutorService analyzeExecutor = Executors.newFixedThreadPool(3); CompletionService<DownloadedItem> completionService = new ExecutorCompletionService<DownloadedItem>(executor); while list has URL do { executor.submit(new DownloadJob(list.getNextURL());//submit to queue for download } new Thread() { public void run() { while there is URL left do { Future<DownloadedItem> downloadedItem = executor.take();//take the result as soon as it finish analyzeExecutor.submit(new AnalyzeJob(downloadedItem.get()); } } }; //...and so on 
+1
source

Source: https://habr.com/ru/post/1436007/


All Articles