Java Concurrency Processing

Question

Java Concurrency Processing

I have the following code:

import java.io.*; import java.util.concurrent.* ; public class Example{ public static void main(String args[]) { try { FileOutputStream fos = new FileOutputStream("1.dat"); DataOutputStream dos = new DataOutputStream(fos); for (int i = 0; i < 200000; i++) { dos.writeInt(i); } dos.close(); // Two sample files created FileOutputStream fos1 = new FileOutputStream("2.dat"); DataOutputStream dos1 = new DataOutputStream(fos1); for (int i = 200000; i < 400000; i++) { dos1.writeInt(i); } dos1.close(); Exampless.createArray(200000); //Create a shared array Exampless ex1 = new Exampless("1.dat"); Exampless ex2 = new Exampless("2.dat"); ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file long startTime = System.nanoTime(); long endTime; Future<Integer> future1 = executor.submit(ex1); Future<Integer> future2 = executor.submit(ex2); int count1 = future1.get(); int count2 = future2.get(); endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("duration with threads:"+duration); executor.shutdown(); System.out.println("Matches: " + (count1 + count2)); startTime = System.nanoTime(); ex1.call(); ex2.call(); endTime = System.nanoTime(); duration = endTime - startTime; System.out.println("duration without threads:"+duration); } catch (Exception e) { System.err.println("Error: " + e.getMessage()); } } } class Exampless implements Callable { public static int[] arr = new int[20000]; public String _name; public Exampless(String name) { this._name = name; } static void createArray(int z) { for (int i = z; i < z + 20000; i++) { //shared array arr[i - z] = i; } } public Object call() { try { int cnt = 0; FileInputStream fin = new FileInputStream(_name); DataInputStream din = new DataInputStream(fin); // read file and calculate number of matches for (int i = 0; i < 20000; i++) { int c = din.readInt(); if (c == arr[i]) { cnt++; } } return cnt ; } catch (Exception e) { System.err.println("Error: " + e.getMessage()); } return -1 ; } }

Where am I trying to count the number of matches in an array with two files. Now, although I run it in two threads, the code does not work well, because:

(start it in one stream, file 1 read time + file 2) <(file 1 || file 2 read time in several streams).

Can someone help me solve this problem (I have 2 main processors and the file size is about 1.5 GB).

+6

java multithreading file-handling

Arpssss Jul 31 '12 at 16:24

source share

2 answers

You will not get any benefit from multithreading because Tomas pointed out reading data from disk. You can get some speed improvement if you are multi-threaded checking, i.e. You sequentially load data from files into arrays, and then the threads perform a parallel check. But given the small size of your files (~ 80kb) and the fact that you are simply comparing ints, I doubt that improving performance will be worth the effort.

Something that will definitely improve your execution speed is if you are not using readInt (). Since you know that you are comparing 20,000 ints, you should read all 20,000 ints into an array at once for each file (or at least in blocks), instead of calling the readInt () function 20,000 times.

+1

onit Jul 31 '12 at 16:54

source share

Tomasz Nurkiewicz · Accepted Answer · 2012-07-31T16:32:59+0000

In the first case, you read one file at a time, byte-by-block, block-by-block. This is as fast as disk I / O, provided the file is not very fragmented. When you are done with the first file, the disk / OS detects the beginning of the second file and continues with a very efficient, linear read of the disk.

In the second case, you constantly switch between the first and second files, forcing the disk to search from one place to another. This extra search time (approximately 10 ms) is at the root of your confusion.

Oh, and you know that disk access is single-threaded, and your task is related to I / O, so there is no way to split this task into multiple streams if you are reading from one physical disk? Your approach can only be justified if:

each thread, in addition to reading from a file, also performed some intensive or blocking processor operations, an order of magnitude slower than I / O.
files are located on different physical disks (another partition is not enough) or on some RAID configurations
you are using an SSD

Java Concurrency Processing

More articles: