Using multithreading in Java to read data

I am trying to think about how to use threads in my program. Right now I have one streaming program that reads one huge file. A very simple program, it just reads line by line and collects some statistics about words. Now I would like to use several threads to make them faster. I am not sure how to approach this.

One solution is to divide the data into X parts in advance, and then there are X threads, each of which works on one part at a time, with one synchronization method for writing statistics to memory. Is there a better approach? in particular, I would like to avoid pre-splitting the data in advance.

Thank!

+3
source share
4 answers

First of all, do some profiling to make sure that your process is actually related to computing, and not to I / O binding. That is, your statistics collection is slower than file access. Otherwise, multithreading will be slower than your program, and not speed it up, especially if you are working on a single-core processor (or in the old JVM).

Also think: if your file is located on your hard drive: how do you plan to read? You risk adding delays to finding the hard drive otherwise, stopping all the threads that managed to finish the job, while one thread asks the hard drive to search for position 0x03457000 ...

+10
source

producer-consumer. , ( , ), , (), ( Java).

Javas IO.

+2

, , , : , ?

.

But wait, read the files one at a time? This does not seem optimal. It is better to read them as a stream of characters (using FileReader).

See this tutorial on the sun.

+2
source

If your problem is related to I / O Bound, perhaps you can consider splitting your data into several files and put them in a distributed file system, such as the Hadoop file system (HDFS), and then run the Map / Reduce operation on it ?

+1
source

Source: https://habr.com/ru/post/1735011/


All Articles