SUSE Support

Here When You Need Us

Null pointer deference rb_erase() when running mkfs.xfs

This document (000019644) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15 (SLES 15)
SUSE Linux Enterprise Server 12 SP4 (SLES12 SP4)

Situation

When running mkfs.xfs on a logical volume, a kernel oops can occur with a trace similar to the following:
[ 391.842983] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 391.846699] IP: rb_erase+0x285/0x350
[ 391.846859] PGD 0 P4D 0
[ 391.846859] Oops: 0002 [#1] SMP PTI
[ 391.846859] CPU: 126 PID: 0 Comm: swapper/126 Tainted: G 4.12.14-150.47-default #1 SLE15
[ 391.846859] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 03/12/2019
[ 391.846859] task: ffff98beabbe05c0 task.stack: ffffb4a0f1618000
[ 391.846859] RIP: 0010:rb_erase+0x285/0x350
[ 391.846859] RSP: 0018:ffff996d37583d40 EFLAGS: 00010206
[ 391.846859] RAX: fffff0ee8d09e3c0 RBX: ffff996b8d761480 RCX: 0000000000000000
[ 391.846859] RDX: fffff0ee8d09e3c0 RSI: ffff996b7a7b7c90 RDI: ffff996b8d761508
[ 391.846859] RBP: ffff996b8d761508 R08: 0000000000000018 R09: 0000000001ffffff
[ 391.846859] R10: 0000000000000000 R11: ffff996ffffd6000 R12: 0000000000000000
[ 391.846859] R13: ffff980c6ae1bc00 R14: ffff98ba7f7b8140 R15: ffff996b7a7b7c90
[ 391.846859] FS: 0000000000000000(0000) GS:ffff996d37580000(0000) knlGS:0000000000000000
[ 391.846859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 391.846859] CR2: 0000000000000018 CR3: 000002974800a001 CR4: 00000000003606e0
[ 391.846859] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 391.846859] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 391.846859] Call Trace:
[ 391.846859] <IRQ>
[ 391.846859] elv_rb_del+0x25/0x40
[ 391.846859] bfq_remove_request+0x7b/0x280
[ 391.846859] bfq_finish_request+0x50/0x390
[ 391.846859] blk_mq_free_request+0x55/0x160
[ 391.846859] scsi_end_request+0x89/0x210 [scsi_mod]
[ 391.846859] scsi_io_completion+0x213/0x630 [scsi_mod]
[ 391.846859] __blk_mq_complete_request+0xcb/0x140
[ 391.846859] storvsc_on_channel_callback+0x252/0x600 [hv_storvsc]
[ 391.846859] ? enqueue_hrtimer+0x37/0x80
[ 391.846859] vmbus_on_event+0x34/0x100 [hv_vmbus]
[ 391.846859] tasklet_action+0x5f/0x110
[ 391.846859] __do_softirq+0xde/0x2c6
[ 391.846859] irq_exit+0xed/0x100
[ 391.846859] hyperv_vector_handler+0x5b/0x70
[ 391.846859] hyperv_callback_vector+0x8f/0xa0
[ 391.846859] </IRQ>
[ 391.846859] RIP: 0010:native_safe_halt+0xe/0x10
[ 391.846859] RSP: 0018:ffffb4a0f161bed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff0c
[ 391.846859] RAX: ffffffffb56e4ec0 RBX: 000000000000007e RCX: 0000000000000000
[ 391.846859] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 391.846859] RBP: 000000000000007e R08: 0000000000000003 R09: 0106c28af1a127bb
[ 391.846859] R10: ffffb4a0f161be08 R11: 00000000003d0900 R12: 0000000000000000
[ 391.846859] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 391.846859] ? __sched_text_end+0x5/0x5
[ 391.846859] default_idle+0x1a/0x100
[ 391.846859] do_idle+0x169/0x1e0
[ 391.846859] cpu_startup_entry+0x5d/0x60
[ 391.846859] start_secondary+0x1b3/0x200
[ 391.846859] secondary_startup_64+0xa5/0xb0
[ 391.846859] Code: 10 0f 84 dd 00 00 00 4c 89 49 08 c3 4c 89 0e 4d 85 d2 0f 84 22 fe ff ff 48 83 c8 01 48 89 0a 49 8902 c3 4d 85 c0 4c 89 06 74 11 <49> 89 10 c3 48 89 0e c3 4d 89 48 10 eb d6 4c 89 0e f3 c3 48 89
[ 391.846859] Modules linked in: ip6table_filter ip6_tables iptable_filter nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ip_tables x_tables rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs mlx5_ib ib_core nls_iso8859_1 nls_cp437 vfat fat mlx5_core mlxfw nfit devlink libnvdimm crc32_pclmul dm_mod pci_hyperv(X) ghash_clmulni_intel pcbc aesni_intel hv_utils(X) aes_x86_64 crypto_simd glue_helper ptp cryptd pcspkr hv_netvsc(X) hyperv_fb(X) pps_core hv_balloon(X) joydev xfs libcrc32c sd_mod serio_raw hv_storvsc(X) hid_generic scsi_transport_fc hyperv_keyboard(X) hid_hyperv(X) crc32c_intel hv_vmbus(X) sg scsi_mod efivarfs autofs4
[ 391.846859] Supported: Yes, External
[ 391.846859] CR2: 0000000000000018
[ 391.846859] ---[ end trace ccf8bbf09fab667a ]---
[ 392.024102] RIP: 0010:rb_erase+0x285/0x350
[ 392.024102] RSP: 0018:ffff996d37583d40 EFLAGS: 00010206
[ 392.024102] RAX: fffff0ee8d09e3c0 RBX: ffff996b8d761480 RCX: 0000000000000000
[ 392.032436] RDX: fffff0ee8d09e3c0 RSI: ffff996b7a7b7c90 RDI: ffff996b8d761508
[ 392.032436] RBP: ffff996b8d761508 R08: 0000000000000018 R09: 0000000001ffffff
[ 392.032436] R10: 0000000000000000 R11: ffff996ffffd6000 R12: 0000000000000000
[ 392.032436] R13: ffff980c6ae1bc00 R14: ffff98ba7f7b8140 R15: ffff996b7a7b7c90
[ 392.032436] FS: 0000000000000000(0000) GS:ffff996d37580000(0000) knlGS:0000000000000000
[ 392.032436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 392.032436] CR2: 0000000000000018 CR3: 000002974800a001 CR4: 00000000003606e0
[ 392.032436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 392.032436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 392.032436] Kernel panic - not syncing: Fatal exception in interrupt
[ 392.032436] Kernel Offset: 0x34000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 392.032436] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[ 392.069668] sched: Unexpected reschedule of offline CPU#52!
[ 392.069668] ------------[ cut here ]------------

Resolution

The following patch fixes the issue:
* Thu Nov 08 2018 hare@suse.de
- block, bfq: postpone rq preparation to insert or merge
  (bsc#1104967 FATE#325924 bsc#1171673).
  Refresh patches.suse/block-bfq-fix-use-after-free-in-bfq_idle_slice_timer.patch
- commit 34ee076
This commit was already present in both SLES15 SP1 and SLES12 SP5 kernels on their service pack release date; as such, neither of these SP versions are affected by this bug.  This can be verified in the RPM change log.

Fixed kernel versions
SLES12 SP4: 4.12.14-95.57-default and newer
SLES15: 4.12.14-150.55-default and newer

A valid LTSS entitlement or ESPOS for SLES for SAP Applications is required to access these updates.

A reboot is required to load the new kernel.  If that is not an option, a workaround can be put into place by switching the IO scheduler on the system's block devices from bfq to mq-deadline. 

To set the mq-deadline scheduler on a block device, run the following command, replacing sdX with the name of the block device:
echo "mq-deadline" > /sys/block/sdX/queue/scheduler

This should be done on all block devices until the fixed kernel is loaded.

Cause

This is caused by a bug in the bfq scheduler code.

Additional Information

There is another kernel Oops which may be related with a trace similar to the following:

[   32.749952] Oops: 0000 [#1] SMP PTI
[   32.749952] CPU: 0 PID: 1286 Comm: ( ) Tainted: G                   4.12.14-150.41-default #1 SLE15
[   32.749952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
[   32.749952] task: ffff9352a100c300 task.stack: ffff9f2bc5d88000
[   32.749952] RIP: 0010:bfq_rq_pos_tree_lookup.isra.22+0x1d/0x80

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000019644
  • Creation Date: 09-Jun-2020
  • Modified Date:14-Aug-2020
    • SUSE Linux Enterprise Server
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.