System hangs while [sisips] kernel module functions are spinning on a write_lock
This document (000019787) is provided subject to the disclaimer at the end of this document.
Environment
Situation
[508821.753092] BUG: soft lockup - CPU#0 stuck for 23s! [oraagent.bin:17436] [508821.753220] Pid: 17436, comm: oraagent.bin Tainted: PF ENX 3.0.101-108.111-default #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [508821.753227] RIP: 0010:[<ffffffff81272a09>] [<ffffffff81272a09>] __write_lock_failed+0x9/0x20 [508821.753304] Call Trace: [508821.753321] [<ffffffff8147d42e>] _raw_write_lock+0xe/0x10 [508821.753343] [<ffffffffa030342c>] ExAcquireResourceExclusiveLite+0xc/0x40 [sisips] [508821.753380] [<ffffffffa030be08>] hook_open+0x198/0x290 [sisips] [508821.753402] [<ffffffff81485abe>] system_call_fastpath+0x22/0x27 [508821.753409] [<00007f315905709d>] 0x7f315905709c
Analyzing the crash dump, almost all (15 from 16 in total) of the active running tasks, which are running on [sisips] module context, are spinning on the same rwlock_t:
crash> rwlock_t 0xffff8809fed74460 struct rwlock_t { raw_lock = { lock = 0x0 } }
With a stack trace like the following:
PID: 43447 TASK: ffff8802fd8f0480 CPU: 1 COMMAND: "sqlplus" #0 [ffff880a3ee2be40] crash_nmi_callback at ffffffff81027895 #1 [ffff880a3ee2be50] notifier_call_chain at ffffffff81481882 #2 [ffff880a3ee2be80] __atomic_notifier_call_chain at ffffffff814818cd #3 [ffff880a3ee2be90] notify_die at ffffffff8148191d #4 [ffff880a3ee2bec0] default_do_nmi at ffffffff8147ecd3 #5 [ffff880a3ee2bee0] do_nmi at ffffffff8147edf8 #6 [ffff880a3ee2bef0] restart_nmi at ffffffff8147e166 [exception RIP: __write_lock_failed+9] RIP: ffffffff81272a09 RSP: ffff8803c70d9e48 RFLAGS: 00000287 RAX: 000000001011d178 RBX: ffff8809fed74460 RCX: 000000001011d177 RDX: ffffffffa0341ce0 RSI: 0000000000000001 RDI: ffff8809fed74460 RBP: ffff8809fbd86800 R8: 0000000000004000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8803c70d9ee8 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #7 [ffff8803c70d9e48] __write_lock_failed at ffffffff81272a09 #8 [ffff8803c70d9e48] _raw_write_lock at ffffffff8147d42e #9 [ffff8803c70d9e50] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips] #10 [ffff8803c70d9e60] hook_stat at ffffffffa03114c7 [sisips] #11 [ffff8803c70d9f80] system_call_fastpath at ffffffff81485abe
While the one remaining active task is running on [sisips] module context:
PID: 43611 TASK: ffff8804c249c340 CPU: 9 COMMAND: "perl" #0 [ffff880a3ef2be40] crash_nmi_callback at ffffffff81027895 #1 [ffff880a3ef2be50] notifier_call_chain at ffffffff81481882 #2 [ffff880a3ef2be80] __atomic_notifier_call_chain at ffffffff814818cd #3 [ffff880a3ef2be90] notify_die at ffffffff8148191d #4 [ffff880a3ef2bec0] default_do_nmi at ffffffff8147ecd3 #5 [ffff880a3ef2bee0] do_nmi at ffffffff8147edf8 #6 [ffff880a3ef2bef0] restart_nmi at ffffffff8147e166 [exception RIP: _ZN7Process12LockHashLineEPv+0x24] RIP: ffffffffa032a2b4 RSP: ffff880407fffe28 RFLAGS: 00000246 RAX: 000000000000002a RBX: ffff8809fbc909e0 RCX: 0000000000000040 RDX: 0000000000000358 RSI: ffff880407fffe3c RDI: ffff8809fade4380 RBP: ffff880407fffe3c R8: f600000000000000 R9: 000000000000aa5b R10: ffffffff81a27ea0 R11: ffffffffa0328580 R12: ffff8804c249c340 R13: ffff880369a144c0 R14: ffff8809fb817a00 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #7 [ffff880407fffe28] _ZN7Process12LockHashLineEPv at ffffffffa032a2b4 [sisips] #8 [ffff880407fffe30] _ZN7Process10findLockedEi at ffffffffa032a347 [sisips] #9 [ffff880407fffe50] _ZN13ProcessCommon21CreateMissingChildrenEP7ProcessP15LIST_ENTRY_LINK at ffffffffa032d6c6 [sisips] #10 [ffff880407fffeb0] _ZN13ProcessCommon14CreateChildrenEP7Process at ffffffffa032d82e [sisips] #11 [ffff880407ffff00] AppfireDestroyProcess at ffffffffa030aa5b [sisips] #12 [ffff880407ffff30] hook_exit_group at ffffffffa02fd388 [sisips] #13 [ffff880407ffff80] system_call_fastpath at ffffffff81485abe
On all the active running tasks, which are also calling NMI, _raw_write_lock is called from ExAcquireResourceExclusiveLite():
crash> bt -a | grep 'NMI exception stack' -A3 --- <NMI exception stack> --- #7 [ffff8803c70d9e48] __write_lock_failed at ffffffff81272a09 #8 [ffff8803c70d9e48] _raw_write_lock at ffffffff8147d42e #9 [ffff8803c70d9e50] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips] -- --- <NMI exception stack> --- #7 [ffff8804c2311e48] __write_lock_failed at ffffffff81272a09 #8 [ffff8804c2311e48] _raw_write_lock at ffffffff8147d42e #9 [ffff8804c2311e50] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips] -- --- <NMI exception stack> --- #7 [ffff880446dc3e38] __write_lock_failed at ffffffff81272a09 #8 [ffff880446dc3e38] _raw_write_lock at ffffffff8147d42e #9 [ffff880446dc3e40] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips] -- --- <NMI exception stack> --- #7 [ffff8804baa93e48] __write_lock_failed at ffffffff81272a09 #8 [ffff8804baa93e48] _raw_write_lock at ffffffff8147d42e #9 [ffff8804baa93e50] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips] -- --- <NMI exception stack> --- #7 [ffff880446fede38] __write_lock_failed at ffffffff81272a09 #8 [ffff880446fede38] _raw_write_lock at ffffffff8147d42e #9 [ffff880446fede40] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips] -- --- <NMI exception stack> --- #7 [ffff8804baa99e48] __write_lock_failed at ffffffff81272a09 #8 [ffff8804baa99e48] _raw_write_lock at ffffffff8147d42e #9 [ffff8804baa99e50] ExAcquireResourceExclusiveLite at ffffffffa030342c [sisips]
[sisips] is a 3rd-party proprietary kernel module coming from "Symantec Critical System Protection":
crash> mod -t NAME TAINTS sisips PFEN sisfim PFEN TAINT: (P) Proprietary module has been loaded TAINT: (F) Module was forcibly loaded TAINT: (N) Unsupported modules loaded
Resolution
Cause
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019787
- Creation Date: 16-Nov-2020
- Modified Date:18-Nov-2020
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com