Mount and sbd failures leading to fencing failure (aio-max-nr too low)
This document (7022255) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12 Service Pack 2
SUSE Linux Enterprise High Availability Extension 12 Service Pack 3
SUSE Linux Enterprise High Availability Extension 11 Service Pack 4
Oracle High Availability agent
Oracle Database
Situation
Services failed to start on node 2 because their filesystems didn’t mount.
This occurred because their underlying LVM volumes weren’t present.
The logical volumes were not present because clvmd was stalled waiting for DLM, which was blocked waiting for the fence of node 1 to complete and the cluster to regain quorum.
The fence had failed because the sbd disk was not accessible at the time of failure.
Resolution
Cause
Additional Information
At the time of the sbd disk being inaccessible to the sbd command, other tools such as fdisk and smartctl can still access the sbd disk.
sbd command uses aio to read each of the possible slots on the sbd disk.
On failures, io_setup returned an EAGAIN error.
EAGAIN description: The specified nr_events exceeds the user’s limit of available events.
Information on sysctl parameters - aio-nr & aio-max-nr:
aio-nr is the running total of the number of events specified on the io_setup system call for all currently active aio contexts.
If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN.
Note that raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures.
The check for exceeding the limits is in kernel routine ioctx_alloc() (in fs/aio.c). The actual check is:
672 if (!nr_events || (unsigned long)nr_events > (aio_max_nr * 2UL))
But before that, nr_events (the number of events being allocated) is given a floor of 4*num_possible_cpus, and then doubled again:
0662 nr_events = max(nr_events, num_possible_cpus() * 4);
0663 nr_events *= 2;
On a server with 288 CPUs, requesting 1 event will actually allocate 288*4*2 = 2304.
So if aio_nr (the current running total of aios) is within that amount of 2*aio_max_nr, the check in ioctx_alloc will fail and return EAGAIN to io_setup, causing sbd to report the disk as unreadable – even though it really isn’t.
The default value of aio_max_nr is 65536.
The standard Oracle recommendation is 1048576 .
On a server with 288 CPUs, the running total in aio-nr can be close to 2*1048576
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7022255
- Creation Date: 31-Oct-2017
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com