Can I rule out that SIGBUS is caused by a "minor page error"? (The kernel log has no distribution rejection)

Question

Can I rule out that SIGBUS is caused by a "minor page error"? (The kernel log has no distribution rejection)

Motivation

I am trying to improve my understanding of SIGBUS Error in Xwayland . This is evidenced by some users of Fedora Linux from February 20, 2018 with Xwayland 1.19.6-5.fc27.x86_64and the Linux kernel 4.15.3-300.fc27.x86-64.

Unfortunately, I do not have the "segfault" log message (or the equivalent for SIGBUS). Xwayland has some kind of meaningless code that captures a fatal signal. But I see siginfoby debugging the coredump, and it seems almost as good.

Definition

I understand that the "main page error" occurs when a virtual memory page is not available in RAM and must be read from disk. I think I am interested in pages supported by the ext4 file system (for example, without direct access to block devices) for this question.

Therefore, a “minor page error” occurs when there is no need to access the disk. I guess the difference is pretty clear, since Linux provides counters for serious and minor page crashes.

My question

If the kernel sends a SIGBUS program, I wonder if I should expect this to be the main page error.

coredump , SIGBUS, . siginfo->si_addr , , . coredump . , coredump : - (.

" " (BUS_ADRALN), siginfo->si_code 2, BUS_ADRERR, " ". , x86, , - SSE.

, , , , "". , , , SIGBUS. , :

, , , . , , 8 ~ 100 /. , Out Of Memory (OOM) , , .

, SIGBUS? - , ? ?

, , .

. , , SIGBUS . , . -. - . rpm --verify --all SMART . , , , . , , , , ; , . - ; , .

, , copy-on-write MAP_PRIVATE .
/dev/zero MAP_ANONYMOUS, , .
. , . ( , , ).
MAP_NONBLOCK ( Linux 2.5.46)
MAP_POPULATE. : , . Linux 2.6.23, MAP_POPULATE . , MAP_POPULATE MAP_NONBLOCK .

: ,

, . https://bugzilla.redhat.com/show_bug.cgi?id=1557682

, . .

$ gdb 2018-03-21.core
...
Core was generated by `/usr/bin/Xwayland :0 -rootless -terminate -core -listen 4 -listen 5 -displayfd'.
Program terminated with signal SIGBUS, Bus error.
#0  _dl_fixup (l=0x7fc0be2e0130, reloc_arg=203) at ../elf/dl-runtime.c:73
73    const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
[Current thread is 1 (Thread 0x7fc0be29fa80 (LWP 1918))]
(gdb) p $_siginfo.si_signum
$1 = 7
(gdb) p $_siginfo.si_code
$2 = 2
(gdb) p $_siginfo._sifields._sigfault.si_addr
$3 = (void *) 0x41bd80
(gdb) disassemble
Dump of assembler code for function _dl_fixup:
   0x00007fc0be0c8bd0 <+0>: push   %rbx
   0x00007fc0be0c8bd1 <+1>: mov    %rdi,%r10
   0x00007fc0be0c8bd4 <+4>: mov    %esi,%esi
   0x00007fc0be0c8bd6 <+6>: lea    (%rsi,%rsi,2),%rdx
   0x00007fc0be0c8bda <+10>:    sub    $0x10,%rsp
   0x00007fc0be0c8bde <+14>:    mov    0x68(%rdi),%rax
   0x00007fc0be0c8be2 <+18>:    mov    0x8(%rax),%rdi
   0x00007fc0be0c8be6 <+22>:    mov    0xf8(%r10),%rax
   0x00007fc0be0c8bed <+29>:    mov    0x8(%rax),%rax
   0x00007fc0be0c8bf1 <+33>:    lea    (%rax,%rdx,8),%r8
   0x00007fc0be0c8bf5 <+37>:    mov    0x70(%r10),%rax
=> 0x00007fc0be0c8bf9 <+41>:    mov    0x8(%r8),%rcx
(gdb) p/x $r8
$4 = 0x41bd78
(gdb) p/x $r8 + 8
$5 = 0x41bd80

, reloc->r_info .

(gdb) p reloc
$6 = (const Elf64_Rela * const) 0x41bd78
(gdb) p &reloc->r_info
$7 = (Elf64_Xword *) 0x41bd80
(gdb) p *reloc
$8 = {r_offset = 8443504, r_info = 936302870535, r_addend = 0}

( maps , abrtd):

00400000-0060b000 r-xp 00000000 fd:00 1708508                            /usr/bin/Xwayland
0080a000-0080d000 r--p 0020a000 fd:00 1708508                            /usr/bin/Xwayland
0080d000-00817000 rw-p 0020d000 fd:00 1708508                            /usr/bin/Xwayland

$ size -x /usr/bin/Xwayland
   text    data     bss     dec     hex filename
0x209ffb     0xbe9d 0x1f3e0 2314872  235278 /usr/bin/Xwayland

+4

c linux x86-64 sigbus

sourcejedi 25 . '18 15:11

2

, , , selftests .

EDIT: , , SS sestftests, , AMD. , , . https://lkml.org/lkml/2018/1/26/436

, , GS , PTI - .

$ uname -r
4.15.10-300.fc27.x86_64

$ git describe --all
heads/4.15.10
$ cat ./Documentation/x86/pti.txt
...
2. Run several copies of all of the tools/testing/selftests/x86/ tests
   (excluding MPX and protection_keys) in a loop on multiple CPUs for
   several minutes.  These tests frequently uncover corner cases in the
   kernel entry code.  In general, old kernels might cause these tests
   themselves to crash, but they should never crash the kernel.

$ cd tools/testing/selftests/x86
$ make
...

4x 4x :

sh -c ' while true; do for i in *; do if test -x $i; then ./$i || exit; fi ; done; done '

:

[RUN]   ARCH_SET_GS(0x200000000), then schedule to 0x200000000
    Before schedule, set selector to 0x3
    other thread: ARCH_SET_GS(0x200000000) -- sel is 0x0
[FAIL]  GS/BASE changed from 0x3/0x0 to 0x0/0x0

[RUN]   Executing 6-argument 32-bit syscall via VDSO
[WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
[WARN]  Flags  after=0000000000200682 id 0 00 d i s 0 0 1 
[WARN]  Flags change=0000000000000855 0 00 o z 0 a 0 p 0 c
[OK]    Arguments are preserved across syscall
[NOTE]  R11 has changed:0000000000200682 - assuming clobbered by SYSRET insn
[OK]    R8..R15 did not leak kernel data
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]    Arguments are preserved across syscall
[OK]    R8..R15 did not leak kernel data
[RUN]   Running tests under ptrace
[RUN]   Executing 6-argument 32-bit syscall via VDSO
[WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
[WARN]  Flags  after=0000000000200686 id 0 00 d i s 0 0 p 1 
[WARN]  Flags change=0000000000000851 0 00 o z 0 a 0 0 c
[OK]    Arguments are preserved across syscall
[NOTE]  R11 has changed:0000000000200686 - assuming clobbered by SYSRET insn
[OK]    R8..R15 did not leak kernel data
[RUN]   Executing 6-argument 32-bit syscall via INT 80
[OK]    Arguments are preserved across syscall
[OK]    R8..R15 did not leak kernel data
Warning: failed to find getcpu in vDSO
[RUN]   Testing getcpu...
[OK]    CPU 0: syscall: cpu 0, node 0
[OK]    CPU 1: syscall: cpu 1, node 0
[OK]    CPU 2: syscall: cpu 2, node 0
[OK]    CPU 3: syscall: cpu 3, node 0
[RUN]   Testing getcpu...
[OK]    CPU 0: syscall: cpu 0, node 0 vdso: cpu 0, node 0 vsyscall: cpu 0, node 0
[OK]    CPU 1: syscall: cpu 1, node 0 vdso: cpu 1, node 0 vsyscall: cpu 1, node 0
[OK]    CPU 2: syscall: cpu 2, node 0 vdso: cpu 2, node 0 vsyscall: cpu 2, node 0
[OK]    CPU 3: syscall: cpu 3, node 0 vdso: cpu 3, node 0 vsyscall: cpu 3, node 0
[NOTE]  failed to find getcpu in vDSO
[RUN]   test gettimeofday()
    vDSO time offsets: 0.000006 0.000000
[OK]    vDSO gettimeofday() timeval was okay
[RUN]   test time()
[FAIL]  vDSO returned the wrong time (1522063297 1522063296 1522063297)

+3

sourcejedi 26 . '18 11:32

sourcejedi · Accepted Answer · 2018-03-29T11:04:49+0000

. -. , SIGBUS- - - , , IO.

https://marc.info/?l=linux-ide&m=152232081917215&w=2

v4.15 /
, SATA LPM...
-, . fsck 2 . , . , , :.)
, , , LPM, . /, , ?
. , , SIGBUS, Xwayland (, , Gnome ) . , SIGBUS , . !
, SIGBUS - , Xwayland. - , . , , , , .
Fedora, . , . , Xwayland , SIGBUS , .
24 Fedora v4.15.
Fedora Xwayland SIGBUS: https://bugzilla.redhat.com/show_bug.cgi?id=1553979
. : https://bugzilla.redhat.com/show_bug.cgi?id=1557682
:
[2018-02-17] https://retrace.fedoraproject.org/faf/reports/2049964/
[315 ] https://retrace.fedoraproject.org/faf/reports/2055378/
EXT4:
Mar 27 11:28:30 alan-laptop kernel: PM: suspend exit
...
Mar 27 11:28:30 alan-laptop kernel: EXT4-fs error (device dm-2):  ext4_find_entry:1436: inode #5514052: comm thunderbird: reading directory lblock 0
Mar 27 11:28:30 alan-laptop kernel: Buffer I/O error on dev dm-2, logical block 0, lost sync page write
(this marked the FS as needing fsck on next boot)
:
Mar 02 18:47:03 alan-laptop kernel: Restarting tasks ...
Mar 02 18:47:03 alan-laptop kernel: Read-error on swap-device (253:1:836184)
Mar 02 18:47:06 alan-laptop kernel: Read-error on swap-device (253:1:580280)
LPM, :
$ head /sys/class/scsi_host/host*/link_power_management_policy
==> /sys/class/scsi_host/host0/link_power_management_policy <==
max_performance

==> /sys/class/scsi_host/host1/link_power_management_policy <==
max_performance
- Dell Lattitude E5450. CPU - i5-5300U (Broadwell).

Can I rule out that SIGBUS is caused by a "minor page error"? (The kernel log has no distribution rejection)

Motivation

Definition

My question

: ,

v4.15 /

More articles: