SUSE Support

Here When You Need Us

How to prevemt CMCI storm subsided kernel messages

This document (000021436) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 15
SUSE Linux Enterprise Server 12

Situation

Constant logging of CMCI storm messages in dmesg:
 
[Tue Mar 12 22:38:50 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 22:43:51 2024] CMCI storm subsided: switching to interrupt mode
[Tue Mar 12 22:48:24 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 22:53:24 2024] CMCI storm subsided: switching to interrupt mode
[Tue Mar 12 22:59:28 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 23:04:29 2024] CMCI storm subsided: switching to interrupt mode
[Tue Mar 12 23:04:59 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 23:10:00 2024] CMCI storm subsided: switching to interrupt mode

 

Resolution

To prevent these messages from being logged, there are two options: 

  • Disabling CMCI messages by adding mce=no_mci to the kernel command line.
  • Alternatively, disable Intel's EDAC driver. This is likely needed in cases the platforms firmware is handling all hardware errors. Consult your hardware vendor for more information. Disabling Intel can be achieved by blacklisting the sb_edac module:
echo "blacklist sb_edac" >> /etc/modprobe.d/50-blacklist.conf && mkinitrd

Note: a system reboot is needed. 

Cause

Intel cpus, on which CPUID reports DisplayFamily_DisplayModel as 06H_1AH, can report information on corrected machine-check errors by delivering an interrupt (CMCI). If the error is persistent the CPU can generate multiple CMCI interrupts which results in a CMCI storm. Usually errors persists because there is a failing RAM DIMM. To prevent system overload the kernel will usually disable the CMCI handler for a certain period of time, expecting the storm to subside in the mean time, subsequently the CMCI mechanism will be re-enabled. Each cycle of disable/enable CMCI handling is characterised by the following messages in the system log:
 
[Tue Mar 12 22:38:50 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 22:43:51 2024] CMCI storm subsided: switching to interrupt mode
[Tue Mar 12 22:48:24 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 22:53:24 2024] CMCI storm subsided: switching to interrupt mode
[Tue Mar 12 22:59:28 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 23:04:29 2024] CMCI storm subsided: switching to interrupt mode
[Tue Mar 12 23:04:59 2024] CMCI storm detected: switching to poll mode
[Tue Mar 12 23:10:00 2024] CMCI storm subsided: switching to interrupt mode

By default once a CMCI storm is detected the kernel will disable it for 5 minutes.

Additional Information

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021436
  • Creation Date: 19-Apr-2024
  • Modified Date:23-Apr-2024
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.