How to increase the processor frequency of a newly created process

Question

How to increase the processor frequency of a newly created process

I worked on a hobby project for some time (written in C), and it is still far from complete. It is very important that this is quick, so I recently decided to conduct a comparative analysis to make sure that my way of solving the problem will not be ineffective.

$ time ./old real 1m55.92 user 0m54.29 sys 0m33.24

I reworked parts of the program to significantly remove unnecessary operations, reduce memory cache misses and incorrect industry predictions. The wonderful Callgrind tool showed me more and more impressive numbers. Most benchmarking was done without using external processes.

 $ time ./old --dry-run real 0m00.75 user 0m00.28 sys 0m00.24 $ time ./new --dry-run real 0m00.15 user 0m00.12 sys 0m00.02

Clearly, at least I was doing something right. However, launching a program for real told a different story.

 $ time ./new real 2m00.29 user 0m53.74 sys 0m36.22

As you may have noticed, time mainly depends on external processes. I do not know what caused the regression. There is nothing strange about this; just the traditional vfork / execve / waitpid executed by one thread, running the same programs in the same order.

Something was supposed to cause distortion in order to slow down the work, so I did a little test (similar to the one below) that would only spawn new processes and not have any overhead associated with my program. Obviously, this was supposed to be the fastest.

 #define _GNU_SOURCE #include <fcntl.h> #include <sys/wait.h> #include <unistd.h> int main(int argc, const char **argv) { static const char *const _argv[] = {"/usr/bin/md5sum", "test.c", 0}; int fd = open("/dev/null", O_WRONLY); dup2(fd, STDOUT_FILENO); close(fd); for (int i = 0; i < 100000; i++) { int pid = vfork(); int status; if (!pid) { execve("/usr/bin/md5sum", (char*const*)_argv, environ); _exit(1); } waitpid(pid, &status, 0); } return 0; } $ time ./test real 1m58.63 user 0m68.05 sys 0m30.96

I do not think.

At this time, I decided to vote for the governor, and the time got better:

 $ for i in 0 1 2 3 4 5 6 7; do sudo sh -c "echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor";done $ time ./test real 1m03.44 user 0m29.30 sys 0m10.66

It seems that each new process is assigned on a separate core, and it takes some time to switch to a higher frequency. I can’t say why the old version was faster. Maybe lucky. Perhaps this (due to inefficiency) caused the CPU to choose a higher frequency earlier.

A good side effect of adjusting the regulator was that compilation time also improved. Compilation seems to require the reuse of many new processes. However, this is not a workable solution, since this program will have to work on other personal computers (and laptops).

The only way to improve the initial time is to limit the program (and child processes) to one processor by adding this code at the beginning:

 cpu_set_t mask; CPU_ZERO(&mask); CPU_SET(0, &mask); sched_setaffinity(0, sizeof(mask), &mask);

Which was actually the fastest despite using the default "ondemand":

 $ time ./test real 0m59.74 user 0m29.02 sys 0m10.67

This is not only a hacker solution, but also does not work if the running program uses multiple threads. This does not mean that my program knows this.

Does anyone have any ideas how to make running processes run at high processor speeds? It must be automated and does not require su priviliges. Although I have only tested this on Linux so far, I intend to port this to the more or less all popular and impopular desktop OSs (and it will also work on servers). Any idea on any platform is welcome.

+6

performance linux fork scheduling

torso May 26, '13 at 16:34

source share

1 answer

Matthias · Answer 1 · 2013-06-03T07:30:36+0000

The processor frequency is displayed (for most OS) as a system property. Therefore, you cannot change it without root privileges. There are a number of studies on extensions that allow you to take specific programs; however, since the power model is even different for the same general architecture, you are unlikely to find a general solution.

Also, keep in mind that to ensure fairness, the linux scheduler uses the runtime of game processes and child processes for the first era of the child. This may affect your problem.

How to increase the processor frequency of a newly created process

More articles: