I understand that reading a file using multiple streams is inefficient for a normal Spindle Drive system.
This is another case, I have high-performance file systems available to me, which provides read speeds of up to 3 GB / s, with 196 processor cores and 2 TB of RAM
A single-threaded Java program reads a file with a maximum of 85-100 MB / s, so I have the potential for improvement than a separate stream. I have to read files up to 1 TB in size and I have enough memory to download it.
I am currently using the following or something similar, but you need to write something with multi-threading to get the best throughput:
Java 7 files: 50 MB / s
List<String> lines = Files.readAllLines(Paths.get(path), encoding);
Java commons-io: 48 MB / s
List<String> lines = FileUtils.readLines(new File("/path/to/file.txt"), "utf-8");
The same with guava: 45 MB / s
List<String> lines = Files.readLines(new File("/path/to/file.txt"), Charset.forName("utf-8"));
Java scanner class: very slow
Scanner s = new Scanner(new File("filepath")); ArrayList<String> list = new ArrayList<String>(); while (s.hasNext()){ list.add(s.next()); } s.close();
I want to be able to upload a file and build the same ArrayList, in the correct sorted sequence, as quickly as possible.
There is another question that reads similarly, but actually differs from: The question is to discuss systems in which multi-threaded file I / O is physically impossible to be effective, but due to technical advances, we now have systems that are designed to support high-performance I / O, and therefore the CPU / SW is a limiting factor, which can be overcome by multi-threaded I / O.
Another question does not answer how to write code for multi-threaded input-output.