Azure virtual machine hang after patching to kernel 4.4.120-94.17.1
This document (7022818) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for Azure
Situation
[ 36.220002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1127] [ 36.224048] Modules linked in: mlx4_core(+) pci_hyperv(X) sb_edac edac_core crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper hv_utils(X) hv_balloon(X) ablk_helper fjes hyperv_fb(X) hv_netvsc(X) cryptd ptp pcspkr pps_core i2c_piix4 processor button joydev ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sd_mod hid_generic hyperv_keyboard(X) hv_storvsc(X) hid_hyperv(X) scsi_transport_fc ata_piix ahci libahci hv_vmbus(X) floppy libata serio_raw sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 [ 36.280042] Supported: Yes, External [ 36.284035] CPU: 0 PID: 1127 Comm: modprobe Tainted: G X 4.4.120-94.17-default #1 [ 36.292036] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 [ 36.300036] task: ffff8807c0ac89c0 ti: ffff8807b14ec000 task.ti: ffff8807b14ec000 [ 36.308044] RIP: 0010:[<ffffffff813346e6>] [<ffffffff813346e6>] delay_tsc+0x26/0x50 [ 36.312049] RSP: 0018:ffff8807b14ef888 EFLAGS: 00000293 [ 36.316056] RAX: 0000000000000000 RBX: ffff8807b1b55640 RCX: 0000001a29d12bb3 [ 36.324043] RDX: 0000001a29d40657 RSI: 0000000000000000 RDI: 000000000003a0ee [ 36.328044] RBP: ffff8807b14ef958 R08: 000000000000000c R09: 0000000000003000 [ 36.336048] R10: 0000000000000002 R11: 00000000ffffffa2 R12: ffff8807bf024220 [ 36.340036] R13: ffff8807b14ef974 R14: ffff8807b1824380 R15: ffff8807ac874000 [ 36.348042] FS: 00007f9c96d01700(0000) GS:ffff8807c1600000(0000) knlGS:0000000000000000 [ 36.356283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.360045] CR2: 00000000010e9748 CR3: 00000007c09dc000 CR4: 0000000000140670 [ 36.364036] Stack: [ 36.368036] ffffffffa037e241 ffff8807b1824660 0000000000000000 ffffffff00000000 [ 36.376036] ffff8807b14ef8a8 ffff8807b14ef8a8 ffffffff810ddd31 0000000000000246 [ 36.380035] ffff8807ac8740e8 ffffffffa037d120 ffff8807b14ef898 0000000242490017 [ 36.388035] Call Trace: [ 36.388035] [<ffffffffa037e241>] hv_compose_msi_msg+0x1c1/0x300 [pci_hyperv] [ 36.396041] [<ffffffff810de077>] irq_chip_compose_msi_msg+0x47/0x60 [ 36.400042] [<ffffffff810e234a>] msi_domain_activate+0x1a/0x40 [ 36.408044] [<ffffffff810e27e2>] msi_domain_alloc_irqs+0x122/0x1d0 [ 36.412043] [<ffffffff8138b942>] __pci_enable_msix+0x422/0x4b0 [ 36.416043] [<ffffffff8138ba13>] pci_enable_msix_range+0x33/0x60 [ 36.424047] [<ffffffffa042c3c0>] mlx4_enable_msi_x+0x160/0x3d0 [mlx4_core] [ 36.428037] [<ffffffffa042e4d8>] mlx4_load_one+0x938/0x11f0 [mlx4_core] [ 36.436051] [<ffffffffa042f3a5>] mlx4_init_one+0x4f5/0x6b0 [mlx4_core] [ 36.440036] [<ffffffff81372614>] local_pci_probe+0x44/0xa0 [ 36.444047] [<ffffffff81373aa4>] pci_device_probe+0xd4/0x120 [ 36.448045] [<ffffffff81474650>] driver_probe_device+0x200/0x420 [ 36.452045] [<ffffffff814748ee>] __driver_attach+0x7e/0x80 [ 36.456263] [<ffffffff8147254a>] bus_for_each_dev+0x5a/0x90 [ 36.464046] [<ffffffff81473aa0>] bus_add_driver+0x1c0/0x280 [ 36.468045] [<ffffffff8147527b>] driver_register+0x5b/0xd0 [ 36.472377] [<ffffffffa030911a>] mlx4_init+0x11a/0x1000 [mlx4_core] [ 36.476044] [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0 [ 36.484044] [<ffffffff81191896>] do_init_module+0x5a/0x1d7 [ 36.488044] [<ffffffff81110a92>] load_module+0x1382/0x1c70 [ 36.492037] [<ffffffff81111530>] SYSC_finit_module+0x70/0xa0 [ 36.496042] [<ffffffff81615f05>] entry_SYSCALL_64_fastpath+0x1e/0xb6 [ 36.504041] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb6 [ 36.508153] [ 36.512037] Leftover inexact backtrace: [ 36.512037] [ 36.516042] Code: 00 00 00 00 00 0f 1f 44 00 00 65 8b 35 44 8a cd 7e 0f ae e8 0f 31 48 89 d1 48 c1 e1 20 48 09 c1 eb 0d f3 90 65 8b 05 2a 8a cd 7e <39> c6 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29
Resolution
- Power off VM
- Add and attach a new NIC to the Azure VM with accelerated networking disabled
- Detach old NIC from Azure VM which had accelerated networking enabled
- Boot VM
- Upgrade to kernel version >= 4.4.126-94.22.1
- zypper upgrade kernel-default-4.4.126-94.22.1
- halt VM
- Detach NIC created in step #1
- Reattach NIC with accelerated networking enabled in step #2
- Boot VM
Cause
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7022818
- Creation Date: 05-Apr-2018
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com