Mount and sbd failures leading to fencing failure (aio-max-nr too low)

This document (7022255) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 12 Service Pack 1
SUSE Linux Enterprise High Availability Extension 12 Service Pack 2
SUSE Linux Enterprise High Availability Extension 12 Service Pack 3
SUSE Linux Enterprise High Availability Extension 11 Service Pack 4
Oracle High Availability agent
Oracle Database

Situation

Unexpected reboot of one node, after which services failed to start on the second node. After the first node is rebooted, the services can be started manually without issue.

Services failed to start on node 2 because their filesystems didn’t mount.
This occurred because their underlying LVM volumes weren’t present.
The logical volumes were not present because clvmd was stalled waiting for DLM, which was blocked waiting for the fence of node 1 to complete and the cluster to regain quorum.
The fence had failed because the sbd disk was not accessible at the time of failure.

Resolution

Increasing (quadrupling) the value of aio-max-nr made the sbd disk consistently readable by the sdb command.

Cause

aio-max-nr value was set too low

Additional Information

This issue was reported to us by our partner HPE, whom conducted all of the troubleshooting and identified the resolution.

At the time of the sbd disk being inaccessible to the sbd command, other tools such as fdisk and smartctl can still access the sbd disk.
sbd command uses aio to read each of the possible slots on the sbd disk.
On failures, io_setup returned an EAGAIN error.
EAGAIN description: The specified nr_events exceeds the user’s limit of available events.

Information on sysctl parameters - aio-nr & aio-max-nr:

aio-nr is the running total of the number of events specified on the io_setup system call for all currently active aio contexts.
If aio-nr reaches aio-max-nr then io_setup will fail with EAGAIN.
Note that raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures.

The check for exceeding the limits is in kernel routine ioctx_alloc() (in fs/aio.c).   The actual check is:

672         if (!nr_events || (unsigned long)nr_events > (aio_max_nr * 2UL))

But before that, nr_events (the number of events being allocated) is given a floor of 4*num_possible_cpus, and then doubled again:

0662         nr_events = max(nr_events, num_possible_cpus() * 4);
0663         nr_events *= 2;

On a server with 288 CPUs, requesting 1 event will actually allocate 288*4*2 = 2304.
So if aio_nr (the current running total of aios) is within that amount of 2*aio_max_nr, the check in ioctx_alloc will fail and return EAGAIN to io_setup, causing sbd to report the disk as unreadable – even though it really isn’t.

The default value of aio_max_nr is 65536.
The standard Oracle recommendation is 1048576 .
On a server with 288 CPUs, the running total in aio-nr can be close to 2*1048576

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

Document ID:7022255
Creation Date: 31-Oct-2017
Modified Date:03-Mar-2020
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com