I have N large files (at least 250M) for the hash. These files are located on P physical disks.
I would like to use them simultaneously with the maximum active K streams, but I canβt hash more than M files on physical disks because it slows down the whole process (I ran a test, parsing 61 files and 8 streams it was slower than with 1 stream, the file was almost everything on one disk).
I am wondering what would be the best approach to this:
- I could use Executors.newFixedThreadPool (K)
- then I have to submit the task using the counter to determine if I should add a new task.
My code is:
int K = 8; int M = 1; Queue<Path> queue = null; // get the files to hash final ExecutorService newFixedThreadPool = Executors.newFixedThreadPool(K); final ConcurrentMap<FileStore, Integer> counter = new ConcurrentHashMap<>(); final ConcurrentMap<FileStore, Integer> maxCounter = new ConcurrentHashMap<>(); for (FileStore store : FileSystems.getDefault().getFileStores()) { counter.put(store, 0); maxCounter.put(store, M); } List<Future<Result>> result = new ArrayList<>(); while (!queue.isEmpty()) { final Path current = queue.poll(); final FileStore store = Files.getFileStore(current); if (counter.get(store) < maxCounter.get(store)) { result.add(newFixedThreadPool.submit(new Callable<Result>() { @Override public Entry<Path, String> call() throws Exception { counter.put(store, counter.get(store) + 1); String hash = null; // Hash the file counter.put(store, counter.get(store) - 1); return new Result(path, hash); } })); } else queue.offer(current); }
Discarding a potentially unsafe operation (like playing with a counter, for example), is there a better way to achieve my goal?
I also think that the loop here may be too big, as it can take a lot of process (almost like an infinite loop).
source share