Recursively kill process R with children in Linux

I am looking for a general method to start and then kill the process R, possibly including all forks or other processes that it invoked.

For example, a user runs a script as follows:

library(multicore); for(i in 1:3) parallel(foo <- "bar"); for(i in 1:3) system("sleep 300", wait=FALSE); for(i in 1:3) system("sleep 300&"); q("no") 

After the user ends the R session, the child processes are still running:

 jeroen@jeroen-ubuntu :~$ ps -ef | grep R jeroen 4469 1 0 16:38 pts/1 00:00:00 /usr/lib/R/bin/exec/R jeroen 4470 1 0 16:38 pts/1 00:00:00 /usr/lib/R/bin/exec/R jeroen 4471 1 0 16:38 pts/1 00:00:00 /usr/lib/R/bin/exec/R jeroen 4502 4195 0 16:39 pts/1 00:00:00 grep --color=auto R jeroen@jeroen-ubuntu :~$ ps -ef | grep "sleep" jeroen 4473 1 0 16:38 pts/1 00:00:00 sleep 300 jeroen 4475 1 0 16:38 pts/1 00:00:00 sleep 300 jeroen 4477 1 0 16:38 pts/1 00:00:00 sleep 300 jeroen 4479 1 0 16:38 pts/1 00:00:00 sleep 300 jeroen 4481 1 0 16:38 pts/1 00:00:00 sleep 300 jeroen 4483 1 0 16:38 pts/1 00:00:00 sleep 300 jeroen 4504 4195 0 16:39 pts/1 00:00:00 grep --color=auto sleep 

To make matters worse, their parent process identifier 1 makes it difficult to identify them. Is there a way to run the R script so that I can recursively kill the process and its children at any time?

Edit: so I don’t want to manually have to search and kill processes. Also, I do not want to kill all R-processes, as there may be others that are all right. I need a method to kill a specific process and all its children.

+6
source share
3 answers

This is mainly about the multi-core part. Children are waiting for you to collect the results - see ?collect . Generally, you should never use parallel without an indication to clear, usually in on.exit . multicore cleans up in higher-level functions such as mclapply , but if you use lower-level functions, you are responsible for performing the cleanup (since multicore cannot know if children were intentionally launched).

Your example is really fictitious because you do not even consider collecting results. But in any case, if this is really what you want, you will have to do the cleaning at some point. For example, if you want to complete all children on exit, you can define .Last as follows:

  .Last <- function(...) { collect(wait=FALSE) all <- children() if (length(all)) { kill(all, SIGTERM) collect(all) } } 

Again, the above not recommended way to handle this is rather the last resort. You really have to assign tasks and collect results, for example

 jobs <- lapply(1:3, function(i) parallel({Sys.sleep(i); i})) collect(jobs) 

As for the general child question of the process, init inherits child objects after R finished, but in .Last you can still find your pids, since the parent process exists at this point so that you can perform the same cleanup as in the multi-core case.

+8
source

Before the user ends the R session, the processes that you want to kill will have the identifier of the parent process equal to the identifier of the process of the session that started them. You could use .Last or .Last.sys (see help(q) ) to kill all processes with the corresponding PPID at this point; they can be suppressed with q(runLast=FALSE) , so it is not perfect, but I think this is the best option you have.

After the user ends the R session, there is no reliable way to do what you want - the only record that stores the core of the process is the PPID you see in ps -ef , and when the parent process terminates, it Information is destroyed as you discover.

Please note that if one of the child processes of the plug, the grandson will have a PPID equal to the child PID, and he will get reset to 1 when the child leaves, what can he do before the exit of grandparents. Thus, there is no reliable way to catch all the descendants of a process in general, even if you do this before the process exits. (One hears that "cgroups" provide a way, but no one is familiar with the details, in any case, this is an additional feature that provides only some iterations / configurations of the Linux kernel and is not available at all elsewhere.)

+4
source

I believe that the last part of the question is more about looking at the shell, rather than the kernel. (Simon Urbanek answered the multicore part better than anyone else, as he is the author. :))

If you use bash, you can find the PID of the most recently launched child process in $! . You can aggregate PIDs, and then be sure to disable them when you close R.

If you want to be truly gonzo, you can save the parent PID (that is, the output of Sys.getpid() ) and the child PID in a file and have a cleanup daemon that checks if the parent PID exists or not, and if not, it kills orphans. I don't think it will be easy to get a package called oRphanKilleR on CRAN.

Here is an example of adding a child PID to a file:

 system('(sleep 20) & echo $! >> ~/childPIDs.txt', wait = FALSE) 

You can change this to create your own shell command and use the R tempfile() command to create a temporary file (although this will disappear when the R instance is complete unless you take special effort to save the file through permissions).

For some other smart ideas, see this other post in SO .

You can also create a do while in the shell that will check for the presence or absence of a specific PID. While he is sleeping. When the loop ends (since the PID is no longer in use), the script will kill another PID.

Basically, I think your solution will be in shell scripts, not R.

+1
source

Source: https://habr.com/ru/post/907536/


All Articles