SLES12 SP5 system panic in tcp_retransmit_skb()
This document (000019642) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12 SP5 (SLES12 SP5)
Situation
A SLES12 SP5 system with kernel 4.12.14-122.12-default or older may suffer an intermittent panic. The console output or kernel log will show the following warning messages:
[172193.273976] WARNING: CPU: 38 PID: 0 at ../net/ipv4/tcp_timer.c:434 tcp_retransmit_timer+0x985/0x9b0
[172193.273977] Modules linked in: loop mmfs26(OEX) mmfslinux(OEX) tracedev(OEX) binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache tcp_diag inet_diag scsi_transport_iscsi bonding iscsi_ibft
iscsi_boot_sysfs rdma_ucm(OEX) ib_ucm(OEX) rdma_cm(OEX) iw_cm(OEX) configfs ib_ipoib(OEX) ib_cm(OEX) ib_umad(OEX) mlx5_ib(OEX) mlx5_core(OEX) mlxfw(OEX) mpt3sas raid_class mptctl mptbase msr intel_rapl sb_edac x86_pkg_temp_thermal intel_
powerclamp iTCO_wdt coretemp iTCO_vendor_support kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel pcbc nls_iso8859_1 nls_cp437 vfat fat cdc_ether aesni_intel ipmi_ssif usbnet aes_x86_64 joydev crypto_simd mii ses glue_helper cryp
td igb ioatdma enclosure ch lpc_ich pcspkr scsi_transport_sas mfd_core dca wmi ipmi_si ipmi_devintf
[172193.274036] ipmi_msghandler pcc_cpufreq button sunrpc mlx4_en(OEX) mlx4_ib(OEX) ptp pps_core ib_uverbs(OEX) ib_core(OEX) ext4 crc16 jbd2 mbcache sd_mod hid_generic usbhid mgag200 crc32c_intel i2c_algo_bit qla2xxx drm_kms_helper sysc
opyarea sysfillrect sysimgblt fb_sys_fops ttm xhci_pci nvme_fc mlx4_core(OEX) xhci_hcd ehci_pci nvme_fabrics devlink ehci_hcd drm nvme_core mlx_compat(OEX) scsi_transport_fc usbcore drm_panel_orientation_quirks megaraid_sas(OEX) sg dm_mu
ltipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod efivarfs autofs4
[172193.274075] Supported: Yes, External
[172193.274079] CPU: 38 PID: 0 Comm: swapper/38 Tainted: G OE 4.12.14-122.12-default #1 SLE12-SP5
[172193.274080] Hardware name: LENOVO x3950 X6 -[6241AC4]-/00WA086, BIOS -[A9E152FUS-4.71]- 12/05/2019
[172193.274081] task: ffff8b0a8152c0c0 task.stack: ffffa5dab142c000
[172193.274084] RIP: 0010:tcp_retransmit_timer+0x985/0x9b0
[172193.274085] RSP: 0018:ffff8bc6bf483e48 EFLAGS: 00010246
[172193.274086] RAX: 0000000000000001 RBX: ffff8bc680c08000 RCX: 000000000000001f
[172193.274087] RDX: 000000281784d847 RSI: 0000000000000004 RDI: ffff8bc680c08000
[172193.274088] RBP: ffff8bc680c08158 R08: 0000000000000004 R09: ffffecc2a5232d5f
[172193.274088] R10: 0000000000000004 R11: 0000000000000005 R12: 0000000000000100
[172193.274089] R13: ffffffff831ca400 R14: ffff8bc680c08000 R15: 0000000000000000
[172193.274090] FS: 0000000000000000(0000) GS:ffff8bc6bf480000(0000) knlGS:0000000000000000
[172193.274091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[172193.274092] CR2: 00007fec7b9606e0 CR3: 000003c4c400a003 CR4: 00000000001606e0
[172193.274093] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[172193.274093] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[172193.274094] Call Trace:
[172193.274099] <IRQ>
[172193.274104] ? tcp_write_timer_handler+0x240/0x240
[172193.274105] tcp_write_timer_handler+0xc6/0x240
[172193.274106] tcp_write_timer+0x69/0x80
[172193.274114] call_timer_fn+0x32/0x140
[172193.274116] run_timer_softirq+0x1d6/0x420
[172193.274127] ? timerqueue_add+0x54/0x80
[172193.274129] ? enqueue_hrtimer+0x38/0x90
[172193.274133] __do_softirq+0xce/0x28b
[172193.274141] irq_exit+0xdb/0xf0
[172193.274147] smp_apic_timer_interrupt+0x3f/0x60
[172193.274150] apic_timer_interrupt+0x8f/0xa0
[172193.274151] </IRQ>
[172193.274154] RIP: 0010:mwait_idle+0x7b/0x1a0
[172193.274154] RSP: 0018:ffffa5dab142fed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[172193.274155] RAX: 0000000000000000 RBX: ffff8b0a8152c0c0 RCX: 0000000000000000
[172193.274156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[172193.274156] RBP: 0000000000000026 R08: 0000000000000004 R09: ffffecc2a5232d5f
[172193.274157] R10: 00000001028fba98 R11: ffff8bc1268b8880 R12: ffff8b0a8152c0c0
[172193.274158] R13: ffff8b0a8152c0c0 R14: 0000000000000000 R15: 0000000000000000
[172193.274166] do_idle+0x160/0x1f0
[172193.274168] cpu_startup_entry+0x5d/0x70
[172193.274174] start_secondary+0x1a5/0x200
[172193.274180] secondary_startup_64+0xa5/0xb0
[172193.274182] Code: ff ff 31 d2 44 89 f6 48 89 df e8 07 f1 ff ff 84 c0 0f 84 ca f9 ff ff 31 f6 e9 c8 f9 ff ff 84 c0 0f 84 ae f9 ff ff e9 b6 f9 ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 e9 c5 f7 ff ff 48 8b 7b 60 48
which then will be followed by the following stack trace:
[172193.274207] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[172193.274213] IP: tcp_retransmit_skb+0x5c/0xc0
[172193.274216] PGD 0 P4D 0
[172193.274218] Oops: 0000 [#1] SMP PTI
[172193.274220] CPU: 38 PID: 0 Comm: swapper/38 Tainted: G W OE 4.12.14-122.12-default #1 SLE12-SP5
[172193.274221] Hardware name: LENOVO x3950 X6 -[6241AC4]-/00WA086, BIOS -[A9E152FUS-4.71]- 12/05/2019
[172193.274222] task: ffff8b0a8152c0c0 task.stack: ffffa5dab142c000
[172193.274223] RIP: 0010:tcp_retransmit_skb+0x5c/0xc0
[172193.274224] RSP: 0018:ffff8bc6bf483e28 EFLAGS: 00010246
[172193.274225] RAX: 00000000fffffff5 RBX: ffff8bc680c08000 RCX: 0000000000000001
[172193.274226] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8bc680c08000
[172193.274227] RBP: 0000000000000000 R08: 0000000016c00000 R09: ffffecc2a5232d5f
[172193.274228] R10: 0000000000000004 R11: 0000000000000000 R12: 00000000fffffff5
[172193.274229] R13: ffffffff831ca400 R14: ffff8bc680c08000 R15: 0000000000000000
[172193.274230] FS: 0000000000000000(0000) GS:ffff8bc6bf480000(0000) knlGS:0000000000000000
[172193.274231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[172193.274232] CR2: 0000000000000030 CR3: 000003c4c400a003 CR4: 00000000001606e0
[172193.274233] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[172193.274234] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[172193.274235] Call Trace:
[172193.274236] <IRQ>
[172193.274238] tcp_retransmit_timer+0x3ff/0x9b0
[172193.274240] ? tcp_write_timer_handler+0x240/0x240
[172193.274241] tcp_write_timer_handler+0xc6/0x240
[172193.274243] tcp_write_timer+0x69/0x80
[172193.274244] call_timer_fn+0x32/0x140
[172193.274246] run_timer_softirq+0x1d6/0x420
[172193.274248] ? timerqueue_add+0x54/0x80
[172193.274250] ? enqueue_hrtimer+0x38/0x90
[172193.274251] __do_softirq+0xce/0x28b
[172193.274253] irq_exit+0xdb/0xf0
[172193.274255] smp_apic_timer_interrupt+0x3f/0x60
[172193.274257] apic_timer_interrupt+0x8f/0xa0
[172193.274259] </IRQ>
[172193.274260] RIP: 0010:mwait_idle+0x7b/0x1a0
[172193.274261] RSP: 0018:ffffa5dab142fed8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[172193.274262] RAX: 0000000000000000 RBX: ffff8b0a8152c0c0 RCX: 0000000000000000
[172193.274263] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[172193.274264] RBP: 0000000000000026 R08: 0000000000000004 R09: ffffecc2a5232d5f
[172193.274265] R10: 00000001028fba98 R11: ffff8bc1268b8880 R12: ffff8b0a8152c0c0
[172193.274265] R13: ffff8b0a8152c0c0 R14: 0000000000000000 R15: 0000000000000000
The most important part in this stack trace is the following line:
RIP: 0010:tcp_retransmit_skb+0x5c/0xc0
The numbers in the brackets [..] may vary as they are just the uptime in seconds since reboot.
Resolution
This is a known problem and has been fixed with kernel 4.12.14-122.17.1 and newer maintenance update kernels.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019642
- Creation Date: 05-Jun-2020
- Modified Date:05-Jun-2020
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com