I have a large list (up to 500,000) of some functions. My task is to generate some graph for each function (it can be executed independently of other functions) and output the output to a file (it can be several files). The process of generating graphs can take a lot of time.
I also have a server with 40 physical cores and 128 GB of RAM.
I tried to implement parallel processing using java Threads / ExecutorPool, but does not seem to use processors for all resources. On some inputs, the program takes up to 25 hours, and only 10-15 cores work in accordance with htop.
So, the second thing I tried was to create 40 different processes (using Runtime.exec) and split the list among them. This method uses the processor of all resources (100% of the load on all 40 cores) and accelerates performance up to 5 times in the previous example (it takes only 5 hours for my task). But the problem with this method is that each java process starts separately and consumes memory independently of the others. Are some scenarios consumed by 128 GB of RAM after 5 minutes of parallel operation. One solution that I'm using now is to call System.gc () for each process if Runtime.totalMemory> 2GB. This slows down overall performance a bit (8 hours at the previous entry), but at the same time uses memory to a reasonable extent. But this configuration only works for my server.If you run it on a server with 40 cores and 64 GB, you need to configure Runtime.totalMemory> 2 GB condition.
, , ?
?
Java (, fork/join?), 100% .