Null pointer deference rb_erase() when running mkfs.xfs
This document (000019644) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12 SP4 (SLES12 SP4)
Situation
[ 391.842983] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 391.846699] IP: rb_erase+0x285/0x350 [ 391.846859] PGD 0 P4D 0 [ 391.846859] Oops: 0002 [#1] SMP PTI [ 391.846859] CPU: 126 PID: 0 Comm: swapper/126 Tainted: G 4.12.14-150.47-default #1 SLE15 [ 391.846859] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 03/12/2019 [ 391.846859] task: ffff98beabbe05c0 task.stack: ffffb4a0f1618000 [ 391.846859] RIP: 0010:rb_erase+0x285/0x350 [ 391.846859] RSP: 0018:ffff996d37583d40 EFLAGS: 00010206 [ 391.846859] RAX: fffff0ee8d09e3c0 RBX: ffff996b8d761480 RCX: 0000000000000000 [ 391.846859] RDX: fffff0ee8d09e3c0 RSI: ffff996b7a7b7c90 RDI: ffff996b8d761508 [ 391.846859] RBP: ffff996b8d761508 R08: 0000000000000018 R09: 0000000001ffffff [ 391.846859] R10: 0000000000000000 R11: ffff996ffffd6000 R12: 0000000000000000 [ 391.846859] R13: ffff980c6ae1bc00 R14: ffff98ba7f7b8140 R15: ffff996b7a7b7c90 [ 391.846859] FS: 0000000000000000(0000) GS:ffff996d37580000(0000) knlGS:0000000000000000 [ 391.846859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 391.846859] CR2: 0000000000000018 CR3: 000002974800a001 CR4: 00000000003606e0 [ 391.846859] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 391.846859] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 391.846859] Call Trace: [ 391.846859] <IRQ> [ 391.846859] elv_rb_del+0x25/0x40 [ 391.846859] bfq_remove_request+0x7b/0x280 [ 391.846859] bfq_finish_request+0x50/0x390 [ 391.846859] blk_mq_free_request+0x55/0x160 [ 391.846859] scsi_end_request+0x89/0x210 [scsi_mod] [ 391.846859] scsi_io_completion+0x213/0x630 [scsi_mod] [ 391.846859] __blk_mq_complete_request+0xcb/0x140 [ 391.846859] storvsc_on_channel_callback+0x252/0x600 [hv_storvsc] [ 391.846859] ? enqueue_hrtimer+0x37/0x80 [ 391.846859] vmbus_on_event+0x34/0x100 [hv_vmbus] [ 391.846859] tasklet_action+0x5f/0x110 [ 391.846859] __do_softirq+0xde/0x2c6 [ 391.846859] irq_exit+0xed/0x100 [ 391.846859] hyperv_vector_handler+0x5b/0x70 [ 391.846859] hyperv_callback_vector+0x8f/0xa0 [ 391.846859] </IRQ> [ 391.846859] RIP: 0010:native_safe_halt+0xe/0x10 [ 391.846859] RSP: 0018:ffffb4a0f161bed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c [ 391.846859] RAX: ffffffffb56e4ec0 RBX: 000000000000007e RCX: 0000000000000000 [ 391.846859] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 391.846859] RBP: 000000000000007e R08: 0000000000000003 R09: 0106c28af1a127bb [ 391.846859] R10: ffffb4a0f161be08 R11: 00000000003d0900 R12: 0000000000000000 [ 391.846859] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 391.846859] ? __sched_text_end+0x5/0x5 [ 391.846859] default_idle+0x1a/0x100 [ 391.846859] do_idle+0x169/0x1e0 [ 391.846859] cpu_startup_entry+0x5d/0x60 [ 391.846859] start_secondary+0x1b3/0x200 [ 391.846859] secondary_startup_64+0xa5/0xb0 [ 391.846859] Code: 10 0f 84 dd 00 00 00 4c 89 49 08 c3 4c 89 0e 4d 85 d2 0f 84 22 fe ff ff 48 83 c8 01 48 89 0a 49 8902 c3 4d 85 c0 4c 89 06 74 11 <49> 89 10 c3 48 89 0e c3 4d 89 48 10 eb d6 4c 89 0e f3 c3 48 89 [ 391.846859] Modules linked in: ip6table_filter ip6_tables iptable_filter nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ip_tables x_tables rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs mlx5_ib ib_core nls_iso8859_1 nls_cp437 vfat fat mlx5_core mlxfw nfit devlink libnvdimm crc32_pclmul dm_mod pci_hyperv(X) ghash_clmulni_intel pcbc aesni_intel hv_utils(X) aes_x86_64 crypto_simd glue_helper ptp cryptd pcspkr hv_netvsc(X) hyperv_fb(X) pps_core hv_balloon(X) joydev xfs libcrc32c sd_mod serio_raw hv_storvsc(X) hid_generic scsi_transport_fc hyperv_keyboard(X) hid_hyperv(X) crc32c_intel hv_vmbus(X) sg scsi_mod efivarfs autofs4 [ 391.846859] Supported: Yes, External [ 391.846859] CR2: 0000000000000018 [ 391.846859] ---[ end trace ccf8bbf09fab667a ]--- [ 392.024102] RIP: 0010:rb_erase+0x285/0x350 [ 392.024102] RSP: 0018:ffff996d37583d40 EFLAGS: 00010206 [ 392.024102] RAX: fffff0ee8d09e3c0 RBX: ffff996b8d761480 RCX: 0000000000000000 [ 392.032436] RDX: fffff0ee8d09e3c0 RSI: ffff996b7a7b7c90 RDI: ffff996b8d761508 [ 392.032436] RBP: ffff996b8d761508 R08: 0000000000000018 R09: 0000000001ffffff [ 392.032436] R10: 0000000000000000 R11: ffff996ffffd6000 R12: 0000000000000000 [ 392.032436] R13: ffff980c6ae1bc00 R14: ffff98ba7f7b8140 R15: ffff996b7a7b7c90 [ 392.032436] FS: 0000000000000000(0000) GS:ffff996d37580000(0000) knlGS:0000000000000000 [ 392.032436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 392.032436] CR2: 0000000000000018 CR3: 000002974800a001 CR4: 00000000003606e0 [ 392.032436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 392.032436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 392.032436] Kernel panic - not syncing: Fatal exception in interrupt [ 392.032436] Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 392.032436] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [ 392.069668] sched: Unexpected reschedule of offline CPU#52! [ 392.069668] ------------[ cut here ]------------
Resolution
* Thu Nov 08 2018 hare@suse.de - block, bfq: postpone rq preparation to insert or merge (bsc#1104967 FATE#325924 bsc#1171673). Refresh patches.suse/block-bfq-fix-use-after-free-in-bfq_idle_slice_timer.patch - commit 34ee076This commit was already present in both SLES15 SP1 and SLES12 SP5 kernels on their service pack release date; as such, neither of these SP versions are affected by this bug. This can be verified in the RPM change log.
Fixed kernel versions
SLES12 SP4: 4.12.14-95.57-default and newer
SLES15: 4.12.14-150.55-default and newer
A valid LTSS entitlement or ESPOS for SLES for SAP Applications is required to access these updates.
A reboot is required to load the new kernel. If that is not an option, a workaround can be put into place by switching the IO scheduler on the system's block devices from bfq to mq-deadline.
To set the mq-deadline scheduler on a block device, run the following command, replacing sdX with the name of the block device:
echo "mq-deadline" > /sys/block/sdX/queue/scheduler
This should be done on all block devices until the fixed kernel is loaded.
Cause
Additional Information
There is another kernel Oops which may be related with a trace similar to the following:
[ 32.749952] Oops: 0000 [#1] SMP PTI [ 32.749952] CPU: 0 PID: 1286 Comm: ( ) Tainted: G 4.12.14-150.41-default #1 SLE15 [ 32.749952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 [ 32.749952] task: ffff9352a100c300 task.stack: ffff9f2bc5d88000 [ 32.749952] RIP: 0010:bfq_rq_pos_tree_lookup.isra.22+0x1d/0x80
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019644
- Creation Date: 09-Jun-2020
- Modified Date:14-Aug-2020
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com