OOM (Out Of Memory), high memory consumption basic troubleshooting steps
This document (000021755) is provided subject to the disclaimer at the end of this document.
Environment
Rancher 2.x
Situation
Memory consumption on the nodes is too high, or OOM kill is happening frequently.
At the Kubernetes level
Start with kubectl top as it should tell what is consuming memory at the point in time:
# check which pods are consuming most memory
kubectl top pods
# check which nodes are affected
kubectl top nodes
A few questions that can help are:
- Which pods are consuming the most resources?
- Is it on a specific node, or across all nodes?
- Describing the node, is it over-provisioned?
This might give opportunities for better capacity planning for your applications.
At the node level
Check the messages (or with dmesg -T) for the OOM Kill message:
- If invoked by cgroup, it means that limits are being respected. Adjust them as needed.
- If invoked by the kernel, it means that the node is running out of memory and OOM is reclaiming it
Check the kubelet logs for OOM kills.
Resolution
Rancher Project Resource Quotas:
Rancher allows for resource management at the Project level. Please review the documentation on how to set limits at the Project and Namespace levels.
For non-Rancher components:
Adjust the requests and limits as per the Kubernetes documentation. It can be done at many levels. At spec.container, or even on the values.yaml. Here is an example from Rancher Monitoring:
resources:
limits:
memory: 500Mi
cpu: 1000m
requests:
memory: 100Mi
cpu: 100m
If you are experiencing issues with Rancher-shipped components, open a case with Rancher Support. Please collect all the data below when contacting SUSE Rancher support.
kubectl top pods
- kubectl top nodes
- Grafana Graphs of the affected services, or graphs from any monitoring in place
- The log bundle: https://www.suse.com/support/kb/doc/?id=000020191
- The resource count of Rancher: https://www.suse.com/support/kb/doc/?id=000021310
- You might be asked by support to also collect profiles of Rancher or Fleet: https://www.suse.com/support/kb/doc/?id=000021615
Cause
OOM kills or high memory usage might be caused by lack of resources, configuration issues or application failures.
Additional Information
An example of debugging high memory consumption for the Prometheus:
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021755
- Creation Date: 25-Mar-2025
- Modified Date:17-Apr-2025
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com