Slowing down the creation process in Java?

I have one big heap (up to 240 GB, although in the range of 20-40 GB for most of this stage of execution) JVM [1] runs on Linux [2] on a server with 24 cores. We have tens of thousands of objects that need to be processed by an external executable, and then load the data created by these executables back into the JVM. Each executable file produces about half a megabyte of data (on disk), which when reading to the right, after the process is completed, is of course more.

Our first implementation was for each executable descriptor to process only one object. This is due to the appearance of twice as many executable files, because we had objects (since we called the shell script, which was called the executable file). Using our processor will start at a high, but not necessarily 100%, and will slowly deteriorate. When we started to measure, to see what was happening, we noticed that the process creation time [3] is constantly slowing down. Starting sub-second time, it will eventually grow to take a minute or more. Actual processing of the executable usually takes less than 10 seconds.

Then we changed the executable file to take a list of objects to process in order to reduce the number of processes created. With batch sizes of several hundred (~ 1% of our current sample size), the process creation time begins in about 2 seconds and increases to about 5-6 seconds.

Basically, why is it taking so long to create these processes as you continue?

[1] Oracle JDK 1.6.0_22
[2] Red Hat Enterprise Linux Advanced Platform 5.3, Linux kernel 2.6.18-194.26.1.el5 # 1 SMP
[3] Creating a ProcessBuilder object, redirecting the error stream and starting it.

+4
source share
4 answers

I assume that you have problems with fork / exec if Java uses fork / exec system calls for spawn subprocesses.

Usually fork / exec is quite efficient because fork () does very little - all pages are copied to write. This ceases to be true with very large processes (i.e., with gigabytes of page mappings), because the page pages themselves take a relatively long time to create - and, of course, destroy it, since you immediately call exec.

How you use a huge amount of heap can affect you. The more pages you type, the worse it can become, which can cause a progressive slowdown.

Consider either:

  • Using posix_spawn if it is NOT implemented by fork / exec in libc
  • Using one subprocess that is responsible for creating / receiving others; dig it once and use some IPC (pipes, etc.) to tell what to do.

NB: These are all assumptions; you should probably do some experimentation to make sure it is.

+3
source

Most likely, you are running out of resources. Your disks become more and more busy when creating these processes. Are you sure you have fewer processes than you have cores? (To minimize context switches) Is your average load below 24?

If your CPU consumption drops, you are likely to encounter an IO (disk / network) conflict, that is, processes cannot receive / write data fast enough to be occupied. If you have 24 cores, how many drives do you have?

I would suggest that you have one process per processor (in your case, I imagine 4) Give each JVM six tasks to run all its kernels simultaneously without overloading the system.

+1
source

You would be much better off using a set of long-lived processes that pulled your data from the queues and sent them back, which constantly opened new processes for each event, especially from the JVM host with this huge heap.

Capturing an image of 240 GB is not free, it consumes a large amount of virtual resources, even if only for a second. The OS does not know how long the new process will be aware, so it should prepare as if the whole process will be long-lasting, so it sets up a virtual clone of all 240 GB before destroying it with the exec call.

If instead you had a long-lived process in which you could end objects through some kind of queue mechanism (and there are many for Java and C, etc.), this will save you some pressure from the branching process.

I do not know how you transfer data from the JVM to an external program. But if your external program can work with stdin / stdout, then (if you use unix), you can use inetd. Here you make a simple entry in the inetd configuration file for your process and assign it a port. Then you open the socket, pour the data into it and then read it back from the socket. Inetd processes the network data for you, and your program runs simply with stdin and stdout. Keep in mind that you will have an open socket on the network, which may or may not be secure in your deployment. But it’s quite simple to configure even outdated code to run through a network service.

You can use a simple shell, for example:

#!/bin/sh infile=/tmp/$$.in outfile=/tmp/$$.out cat > $infile /usr/local/bin/process -input $infile -output $outfile cat $outfile rm $infile $outfile 

This is not the highest-performing server on the planet, designed for a billion transactions, but it is much faster than deploying 240 GB again and again.

+1
source

I most agree with Peter. You most likely suffer from bottlenecks in IO. After you can process the OS, you have to work more intensively and for trivial tasks, therefore, with an exponential decrease in performance.

Thus, the “solution” can be the creation of “consumer” processes, only the initialization of a few; as Peter suggested one per processor or more. Then use some form of IPC to “transfer” these objects to consumer processes.

Your “consumer” processes must control the creation of the subprocess; an executable file of processing, which, I believe, you do not have, and thus you do not clutter the OS with too many executable files, and the "task" will be "ultimately" completed.

0
source

Source: https://habr.com/ru/post/1333909/


All Articles