Reboot a CPU that stops responding

I am working on a set of kernel changes, which allows me to underestimate my processor at runtime. One of the consequences of the extreme dissatisfaction that I often encounter is that the processor becomes completely unresponsive.

I tried to use the functions cpu_upand cpu_downin the hope of asking the kernel to restore the CPU, but to no avail.

Is there a way to restore the CPU from this state? Are there any routines in the kernel that can return the CPU from this non-responsive state?

+4
source share
2 answers

-, , , , (, 5-10 ). ( , ). , , . - , ​​ ECC ( - ). . Linux ( , ). , , , . . . - (, - , , - , . ).

. , x86 (MCA). , Linux, , , ( , ). .

x86 MCE, Linux:

struct mca_config mca_cfg __read_mostly = {
    .bootlog  = -1,
    /*
     * Tolerant levels:
     * 0: always panic on uncorrected errors, log corrected errors
     * 1: panic or SIGBUS on uncorrected errors, log corrected errors
     * 2: SIGBUS or log uncorrected errors (if possible), log corr. errors
     * 3: never panic or SIGBUS, log all errors (for testing only)
     */
    .tolerant = 1,
    .monarch_timeout = -1
};

, tolerant 1. , MCE Linux, tolerant . machine_check_poll do_machine_check..

, , mcelog mcedaemon. MCA 3 3 16 Intel. ARM ECC , .

, - . . , . ( ).

cpu_up cpu_down CPU, .

CPU Hotplug. .

+1

. x86_64 s390:

- , , , , , CONFIG_HOTPLUG_CPU= y.

, , , , . 4.x, cpuhp_ * <linux/cpuhotplug.h> cpuhp_setup_state_multi may be the one you can use to set things up ... if in doubt look at cpuhp_setup_state_nocalls as well as __ cpuhp_setup_state`... , :-)

0

Source: https://habr.com/ru/post/1695498/


All Articles