SUSE Support

Here When You Need Us

Slow boot boot initialization on machines with Intel Optane DC Memory causing auto-mount to fail

This document (7023909) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12

Situation

SUSE Linux Enterprise Server 12 SP4 and SUSE Linux Enterprise Server for SAP Applications 12 SP4 add support for Intel Optane DC memory (DCPMMs).  Please see the SLES 12 SP4 Release Notes here
 
SUSE Linux Enterprise Server 15 and SUSE Linux Enterprise Server for SAP Applications 15 add support for Intel Optane DC memory (DCPMMs).  Please see the SLES 15 Release Notes here
 
This enables SAP workloads, such as SAP HANA to benefit from persistent memory in the future to shorten start times of the system and provide better overall system stability. Currently, configurations up to 12 TB of NVDIMMs plus 3 TB of regular DIMMS of supported memory and 4 socket machines have been tested. Additional configurations will be tested over time.
 
From a file system perspective, the XFS file system is supported for the NVDIMMs, with SAP HANA running in DAX mode. SUSE intends to keep the leading position as technology provider, working closely with SAP on future developments.
 
If there are pmem namespaces, these need to be destroyed before the installation. To mount persistent memory directly on boot, we recommend adding the nofail mount option in /etc/fstab as it can take a long time for the /dev/pmem devices to become usable.
 
For example :
 
/dev/pmem0    /mnt/pmem0    xfs    dax,nofail    0  0
/dev/pmem1    /mnt/pmem1    xfs    dax,nofail    0  0
 
Namespaces need to be created individually. That means, you need to execute the following command for each namespace you want to create:
 
ndctl create-namespace --mode=fsdax --map=dev
 
On new generation hardware, a system with 12x 512 GB DCPMMs (Intel Optane DC Persistent Memory), where the namespaces are set to '--mode=fsdax --map=dev',  initializing the the structs page table allocated for the DCPMMs take a long time, and may result in various symptoms.
 
For example :
 
  • Both SLES12 SP4 and SLES15 emit a WARNING with stack trace, which is quite visible in dmesg.  
--- cut here ---
[  577.289407] Device driver nd_pmem initialization / probe took 542228 ms to complete, report this
[  577.289418] ------------[ cut here ]------------
[  577.289428] WARNING: CPU: 4 PID: 1950 at ../kernel/module.c:3879 load_module+0x1a19/0x2190
--- cut here ---
 
This tells us that the module took 542s to initialize and that is considered long enough to trigger a generic warning in the modules loader. This conditional is not fatal in itself but userspace triggering a message the module load might have its own thresholds and failure modes. 
 
  • Both SLES12 SP4 and SLES15 will create a NMI warning message when changing and destroying a large "fsdax" or "devdax" namespace.
--- cut here ---
NMI watchdog: BUG: soft lockup - CPU#<CPU-ID> stuck for 23s! [ndctl:<PID>]
--- cut here ---
 
  • SLES12 SP4 boot procedure stops at emergency shell without any change to /etc/fstab and the default systemd timeout value of 90 sec.
  • SLES15 boot procedure stops at emergency shell if user modifies /etc/fstab to have entries to mount pmem devices (without nofail) while the default systemd timeout value remains 90 sec.
 

Resolution

It is not safe to modify the systemd service files directly, since these may be overwritten following vendor updates. 
 
There is a procedure available to override them, according the "man systemd.unit" page. The 'Examples' section describes 2 methods of overriding vendor settings, from which we prefer the following :
 
Please create a file under /etc/systemd/system/systemd-udev-settle.service.d/ named 00-override.conf with the following contents :
 
--- cut here ---
[Service]
TimeoutSec=1200s
 
ExecStart=/usr/bin/udevadm settle -t 300
--- cut here ---
 
This will override _only_ the specific changes we require to have the system startup properly.

Cause

The systemd & udev default time-out is too short for large memory setups

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7023909
  • Creation Date: 03-Jun-2019
  • Modified Date:21-Apr-2021
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.