SUSE Support

Here When You Need Us

Cluster settings that affect the number of actions or jobs that can be executed in parallel

This document (7024060) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise High Availability Extension 11 SP4

Situation

What are cluster settings that might affect the number of actions or jobs that can be executed in parallel in a cluster.
Jobs or actions might include "start", "stop", "monitor" type operations for each resource in the cluster.

Resolution

1. LRMD_MAX_CHILDREN=
/etc/sysconfig/pacemaker
  LRMD_MAX_CHILDREN=4
Details:  This option has been deprecated in favor of node-action-limit but if set will still affect the number of in-flight actions that will run on a cluster node. This is for backward compatibility. 
Action:  You can comment this out and use "node-action-limit" instead.
Note:  Code dropped in SLE15SP0

2. node-action-limit=   -->  Cluster property -->  cib-bootstrap-option -->node-action-limit=
Details: This is a per node limit. This is the number of in-flight actions that run on a local cluster node.
** It defaults to 2x CPU cores.

3. batch-limit=     --> Cluster property --> cib-bootstrap-option -->  batch-limit=
Details: This is a cluster wide limit for number of actions.
** The number of jobs that the Transition Engine (TE) is allowed to execute in parallel. The TE is the logic in pacemaker’s CRMd that executes the actions determined by the Policy Engine (PE). The "correct" value will depend on the speed and load of your network and cluster nodes.

Note:  These limits are loaded into memory upon startup and enforced by the DC node.   However, a restart of the pacemaker stack on each node can be done one at a time to load up the new values.  

Cluster Resource Manager (CRM) logic:
1) Check the number of in-flight actions have reached the cluster-wide limit (batch-limit).
   * If so, hold it.
   * If not, go to step 2)
2) Check the number of in-flight actions on that node has reached the per-node limit (node-action-limit).
  * If so , hold it.
  * If not, issue it.
 
CRM also takes "CPU Load" into consideration when scheduling actions.
If crmd detects a high load (default 80% of (2 x (number of CPU's) then it will log a message similar to this:
crmd[2034]:   notice: High CPU load detected: 18.410000
and delay scheduling actions even if batch-limit and node-action-limit haven't been reached.
 

Cause

The purpose of throttling or limiting the number of parallel actions is to keep Pacemaker from overloading the nodes such that actions might start timing out, causing unnecessary failures and need for recovery operations.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7024060
  • Creation Date: 13-Aug-2019
  • Modified Date:27-Apr-2021
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.