Crashkernel=512M@128M Set on the Dom0, Causes Xen HVM Guest to Crash on Startup
This document (7017624) is provided subject to the disclaimer at the end of this document.
Environment
Xen 4.4.3
Situation
SLES 11 SP4 fully patched running XEN and running a SLES11SP4 hvm guest. When we start this hvm guest, it will crash during boot with this message:
[ 3.328784] xen_mem: Initialising balloon driver. [ 3.350533] Initialising virtual ethernet driver. [ 7.225896] emc: device handler registered [ 8.173649] ------------[ cut here ]------------ [ 8.173649] kernel BUG at /usr/src/packages/BUILD/xen-4.4.2-testing/obj/default/balloon/balloon.c:407! [ 8.173649] invalid opcode: 0000 [#1] SMP [ 8.173649] CPU 0 [ 8.173649] Modules linked in: scsi_dh_emc scsi_dh xen_vnif xen_balloon ata_generic ata_piix libata scsi_mod xen_vbd xen_platform_pci [ 8.173649] Supported: Yes [ 8.173649] [ 8.173649] Pid: 11, comm: kworker/0:1 Not tainted 3.0.101-63-default #1 Xen HVM domU [ 8.173649] RIP: 0010:[<ffffffffa0076604>] [<ffffffffa0076604>] decrease_reservation+0x194/0x1a0 [xen_balloon] [ 8.173649] RSP: 0018:ffff880108bc5dc0 EFLAGS: 00010083 [ 8.173649] RAX: 00000000000001b5 RBX: 0000000000000200 RCX: 0000000000000000 [ 8.173649] RDX: 0000000000000100 RSI: ffff880108bc5dc0 RDI: 0000000000002801 [ 8.173649] RBP: 0000000000000200 R08: 00000000000139d0 R09: 00000000000139d0 [ 8.173649] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0000000000 [ 8.173649] R13: 0000000000000000 R14: 0000000000000246 R15: ffffffffa0076610 [ 8.173649] FS: 0000000000000000(0000) GS:ffff88010fc00000(0000) knlGS:0000000000000000 [ 8.173649] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 8.173649] CR2: 00007f1e4a9eaae0 CR3: 0000000001a09000 CR4: 00000000000006f0 [ 8.173649] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.173649] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 8.173649] Process kworker/0:1 (pid: 11, threadinfo ffff880108bc4000, task ffff880108bc22c0) [ 8.173649] Stack: [ 8.173649] ffffffffa0078580 0000000000000200 0000000000000000 0000000000007ff0 [ 8.173649] fffffffffffc2000 0000000000000000 ffff880108bc4010 ffff88010fc0c700 [ 8.173649] ffff88010fc13c05 ffffffffa007670d 0000000000000000 ffffffffa0078000 [ 8.173649] Call Trace: [ 8.173649] [<ffffffffa007670d>] balloon_process+0xfd/0x110 [xen_balloon] [ 8.173649] [<ffffffff8107d39c>] process_one_work+0x16c/0x350 [ 8.173649] [<ffffffff810800ca>] worker_thread+0x17a/0x410 [ 8.173649] [<ffffffff81084496>] kthread+0x96/0xa0 [ 8.173649] [<ffffffff81470564>] kernel_thread_helper+0x4/0x10However, this problem only happens if: maxmem is higher than the memory for DomU configuration:
# xm list -l sles11 |grep mem (maxmem 4096) (memory 2048) AND crashkernel parameter is set to crashkernel=512M@128M for Dom0 If I change one of those 2 conditions above, the HVM guest will load without any problems For example, making both memory and maxmem the same amount OR changing crash parameter to be crashkernel=256M@16M
My server has 12GB memory and dom0_mem=2048M parameter is set. I disabled ballooning (enable-dom0-ballooning no). Customer is having the same issue with a server with 128GB memory. I also noticed if we remove the offset parameter from crashkernel hvm will load too. However, kdump process won't load. Looks like offset is required for Xen kernel.
Resolution
https://ptf.suse.com/a36c11ebc5300def75dd81c34eed2245/sles11-sp4/10857/x86_64/20160517Hardware CPU however, must support HAP
Also HAP can be enabled. This also fixes the problem. Usually HAP is enabled by default on newer versions of xen. HAP can be enabled/disabled by specifying hap=0 or 1 in the /etc/xen/vm/<guest> config file.
Cause
Additional Information
HAP stands for hardware assisted paging and requires a CPU feature
called EPT by Intel and RVI by AMD. It is used to manage the guest's
MMU. The alternative is shadow paging, completely managed in software by
Xen.
On HAP TLB misses are expensive so if you have really random access, HAP
will be expensive. On shadow page table updates are expensive.
HAP is enabled by default (and it is the recommended setting) but can be
disabled/enabled by passing hap=0 or 1 in the guest VM config file. Usually this file is in /etc/xen/vm/<guest> but can be in different locations depending on how the guest was installed. This setting is for HVM (fully virtualized guests).
HAP (Hardware Assisted Paging) can be optionally used to boost the
performance of Xen memory management for HVM VMs. HAP is an additional
feature of the CPU, and it's not present on older CPUs. Intel HAP is
called Intel EPT (Extended Page Tables) and AMD HAP is called AMD NPT
(Nested Page Tables). AMD NPT is sometimes also referred as AMD RVI
(Rapid Virtualization Indexing).
How to check if your CPU supports HAP:
"xl dmesg" to verify if HAP is supported on your CPU:
"(XEN) HVM: Hardware Assisted Paging detected and enabled" or a similar message such as: "(XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB"
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7017624
- Creation Date: 20-May-2016
- Modified Date:28-Sep-2022
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com