TSC Clocksource Switching to HPET During High I/O Load
This document (000021035) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 15 SP3
SUSE Linux Enterprise Server for SAP Applications 15 SP4
SUSE Linux Enterprise Server for SAP Applications 15 SP5
SUSE Linux Enterprise Server 12 SP5
SUSE Linux Enterprise Server 15 SP3
SUSE Linux Enterprise Server 15 SP4
SUSE Linux Enterprise Server 15 SP5
Situation
A workqueue lockup appeared first, which has been attributed to a worker becoming unresponsive in the blk_mq_run_work_fn() function:
BUG: workqueue lockup - pool cpus=436 node=7 flags=0x0 nice=0 stuck for 42s! Showing busy workqueues and worker pools: workqueue kblockd: flags=0x18 pwq 873: cpus=436 node=7 flags=0x0 nice=-20 active=2/256 refcnt=3 in-flight: 3743:blk_mq_run_work_fn pending: blk_mq_run_work_fn
Followed by a RCU stall, with a warning message indicating that CPUs 6 and 266 had become stuck in sbitmap_queue_clear() and __sbq_wake_up() functions while handling a block layer softirq:
rcu: INFO: rcu_sched self-detected stall on CPU rcu: 199-....: (1 GPs behind) idle=01a/1/0x4000000000000004 softirq=173152/173153 fqs=4912 (t=15000 jiffies g=4183933 q=26243) Sending NMI from CPU 199 to CPUs 6: NMI backtrace for cpu 6 CPU: 6 PID: 0 Comm: swapper/6 Tainted: G X 5.3.18-150300.59.101-default #1 SLE15-SP3 RIP: 0010:__sbq_wake_up+0x64/0xd0 Call Trace: <IRQ> sbitmap_queue_clear+0x3b/0x70 __blk_mq_free_request+0x7f/0xb0 scsi_end_request+0x9d/0x120 [scsi_mod] scsi_io_completion+0x85/0x4f0 [scsi_mod] blk_done_softirq+0x8f/0xc0 __do_softirq+0xe3/0x2da irq_exit+0xd5/0xe0 call_function_single_interrupt+0xf/0x20 ... ... ... NMI backtrace for cpu 266 CPU: 266 PID: 0 Comm: swapper/266 Tainted: G X 5.3.18-150300.59.101-default #1 SLE15-SP3 RIP: 0010:__sbq_wake_up+0x27/0xd0 Call Trace: <IRQ> sbitmap_queue_clear+0x3b/0x70 __blk_mq_free_request+0x7f/0xb0 scsi_end_request+0x9d/0x120 [scsi_mod] scsi_io_completion+0x85/0x4f0 [scsi_mod] blk_done_softirq+0x8f/0xc0 __do_softirq+0xe3/0x2da irq_exit+0xd5/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ>
The worker remained blocked in blk_mq_run_work_fn():
workqueue kblockd: flags=0x18 pwq 873: cpus=436 node=7 flags=0x0 nice=-20 active=2/256 refcnt=3 in-flight: 3743:blk_mq_run_work_fn pending: blk_mq_run_work_fn pool 873: cpus=436 node=7 flags=0x0 nice=-20 hung=80s workers=2 idle: 3006
Eventually, the clocksource became unstable due to the skew being too large:
clocksource: timekeeping watchdog on CPU301: Marking clocksource 'tsc' as unstable because the skew is too large: clocksource: 'hpet' wd_now: a5dab66a wd_last: ea6da5ee mask: ffffffff clocksource: 'tsc' cs_now: 2a61230e401d7e cs_last: 2a60d0e23da03e mask: ffffffffffffffff tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (42172076095957, -113650410)<-(42172793736635, -831659344) clocksource: Switched to clocksource hpet
The above warning message is generated when the skew between the periods measured by HPET and TSC exceeds 0.0625 seconds. The counter values for HPET show that HPET overflowed and the period elapsed since the last check was 131.020 seconds:
0xffffffff + 1 + 0xa5dab66a - 0xea6da5ee = 3144487036 3144487036 / (24 * 10^6) = 131.020293166
The counter values for TSC show that the period elapsed since the last check was 131.020 seconds, (cs_now - cs_last) / (tsc_mhz * 10^6):
0x2a61230e401d7e - 0x2a60d0e23da03e = 352925678912 352925678912 / (2693.671 * 10^6) = 131.020335784
A NIC transmit queue timeout was also observed:
NETDEV WATCHDOG: eth4 (mlx5_core): transmit queue 35 timed out
Resolution
SUSE version | Affected Kernels | Not Affected Kernels |
---|---|---|
SLES12 SP5 | 4.12.14-122.144.1 4.12.14-122.147.1 | 4.12.14-122.139.1 and older 4.12.14-122.150.1 and newer |
SLES15 SP3 | 5.3.18-150300.59.101.1 5.3.18-150300.59.106.1 | 5.3.18-150300.59.98.1 and older 5.3.18-150300.59.109.1 and newer |
SLES15 SP4 | 5.14.21-150400.24.33.2 5.14.21-150400.24.38.1 | 5.14.21-150400.24.28.1 and older 5.14.21-150400.24.41.1 and newer |
SLES15 SP5 | 5.14.21-150500.55.39.1 and older | 5.14.21-150500.55.49.1 and newer |
In regards to SLES 15 SP2, it should be noted that the commits which trigger TSC clocksource switches to HPET due to CPUs becoming stuck in sbitmap_queue_clear() and __sbq_wake_up() have not been backported to this version. As such, SLES 15 SP2 remains unaffected by the TSC switches to HPET resulting from CPUs becoming stuck in sbitmap_queue_clear() and __sbq_wake_up(). An unstable TSC clocksource switching to HPET can result from various events and condition. The currently active clocksource being marked as unstable is not an issue with a single root cause and sometimes it is what should happen because the clocksource is indeed unstable. The root causes outlined in this TID illustrate the importance of the specific details and circumstances surrounding the issue.
If a similar issue occurs on SLES15 SP2 or other Service Packs not listed in this TID, please generate a supportconfig and contact SUSE Support for further investigation. In SLES 15 SP2, a temporary workaround involves disabling the clocksource watchdog, which may generate latency spikes. To disable the clocksource watchdog, add the kernel parameter "tsc=nowatchdog" during boot time. Set the parameter as a GRUB option by appending it to the GRUB_CMDLINE_LINUX_DEFAULT line in the /etc/default/grub file, afterwards regenerate GRUB and reboot the system:
# vi /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="... tsc=nowatchdog" # grub2-mkconfig -o /boot/grub2/grub.cfg
Cause
Status
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021035
- Creation Date: 04-Apr-2023
- Modified Date:31-Jul-2024
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com