Server is in hung state, caused by Trellix anti-virus software
This document (000021639) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 15 SP5
SUSE Linux Enterprise Server for SAP 15 SP5
McAfee / Trellix Endpoint Security for Linux Threat Prevention 10.7.16.27
Situation
Server is hung. Access (even console access) is inoperative, active sessions on the server are all hung. Once the server is rebooted, it works fine for a short time and then hangs again.
Resolution
Cause
Crash analysis:
CPUS: 28
LOAD AVERAGE: 81.97, 53.42, 24.47
There is nothing happening on the CPUs actively (swapper is an idling process. This should not be confused with kswapd which handles swapping):
crash> bt -a | grep 'exception RIP' | sort | uniq -c | sort -nr
28 [exception RIP: native_safe_halt+11]
crash> bt -a | grep 'COMMAND' | awk '{print $NF}' | awk -F\/ '{print $1}' | sort | uniq -c | sort -nr
28 "swapper
Many of the processes are stuck in uninterruptible sleep (UN state):
crash> ps -S
RU: 28
UN: 85
IN: 1375
ID: 398
All the processes are stuck behind an fanotify/fsnotify event, awaiting a userspace process to unlock them:
crash> for UN bt | grep '#2 ' | awk '{if ($NF ~ /\[.*\]/) print $3" "$NF; else print $3;}' | sort | uniq -c | sort -nr
85 fanotify_handle_event
crash> for UN bt | grep '#3 ' | awk '{if ($NF ~ /\[.*\]/) print $3" "$NF; else print $3;}' | sort | uniq -c | sort -nr
85 fsnotify
crash> for UN bt | grep '#4 ' | awk '{if ($NF ~ /\[.*\]/) print $3" "$NF; else print $3;}' | sort | uniq -c | sort -nr
85 __fsnotify_parent
The process that owns the fsnotify list, blocking 86 processes, belongs to an OAS process.
crash> for UN files | grep -Ei "command|fanotify" | grep -B1 "notify"
PID: 10137 TASK: ffff8af4bcdbc000 CPU: 27 COMMAND: "OAS Res Br<-Mgr"
crash> struct -x file.private_data ffff8af386c27400
private_data = 0xffff8af2ca599a00,
crash> struct -x fsnotify_group.notification_list 0xffff8af2ca599a00
notification_list = {
next = 0xffff8af2ca6bcb88,
prev = 0xffff8af2ce29f948
},
crash> list 0xffff8af2ce29f948 | wc -l
86
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021639
- Creation Date: 09-Dec-2024
- Modified Date:10-Dec-2024
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com