Leap second issues - June 30, 2012
This document (7010351) is provided subject to the disclaimer at the end of this document.
Environment
Situation
- printk issue (bnc#767684) - a possible, though very improbable, deadlock
caused by the logging of leap second information due to bad locking.
Affected products: SLE 11 SP1 and newer
Fix available: Yes
Workaround available: Yes
Visible effects: System is deadlocked, no response
Risk: very low - possible deadlock due to leap time processing (bnc#768632).
Affected products: SLE 11 SP1 and newer
Fix available: Yes
Workaround available: Yes
Visible effects: System is deadlocked, no response
Risk: very low - The issue identified after the leap second can cause applications that are using FUTEXes to consume 100% of CPU. The issue is present in all Linux kernel versions >= 2.6.22, therefor affecting SLE 11 SP1 and later releases.
The issue is caused by the FUTEX subsystem timing getting de-synchronized causing FUTEX calls to return based on a time-out, these calls are looping to look for events which causes 100% CPU consumption. We recommend to apply the workaround even if everything is currently alright with the system.
Affected products: SLE 11 SP1 and newer
Fix available: Yes
Workaround available: Yes
Visible effects: Applications start using 100% CPU and strace shows lots of futex() = ETIMEDOUT failures on those processes.
Risk: very high
Resolution
SLES11 SP1 x86_64 kernel >= 2.6.32.59-0.7.1
SLES11 SP2 x86_64 kernel >= 3.0.38-0.5.1
SLES11 SP1 i386 kernel >= 2.6.32.59-0.7.1
SLES11 SP2 i386 kernel >= 3.0.38-0.5.1
Workaround for the deadlock issue (first and second issue)
The workaround is to switch NTP to "slew" mode 24 hours prior to the leap second and switch back on July 1st.
To achieve this edit /etc/sysconfig/ntp and add option "-x" to the NTPD_OPTIONS, that it looks like
NTPD_OPTIONS="-x -g -u ntp:ntp"
afterward restart the ntp service with
rcntp restartPlease make sure to do this prior 23:59:59 UTC 06/29/2012
Technical background:
The workaround changes the NTP service to run in "slew" mode, this will avoid using the adjtime() syscall. The syscall brings the kernel in TIME_INS state and it will then insert the leap second at 23:59:59 UTC and print a notification. As this is done with printk() during a xtime_lock hold this can cause a deadlock in very rare cases on very high system load. The adjtime() syscall can occur anytime within a 24h window prior the leap second. So it is necessary to change NTP before this 24h window. If the window is over the NTP can safety be changed back again.
Workaround for the FUTEX issue (third issue)
Set the date with the current date. The following command will achieve this:
date -s "$(LC_ALL=C date)"
Technical background:
Setting the date/time will trigger a clock_was_set() syscall which is missing after the time was changed because of the leap second. If not applied the kernel continues running with per-CPU hrtimer bases unsynchronized with the real time. Starting a new process does not fix this. If a new process schedules an absolute hrtimer, it will wait for one second less than it should. If the timer is scheduled to less than one second ahead of the current real time, the timer will trigger immediately.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7010351
- Creation Date: 02-Jul-2012
- Modified Date:12-Oct-2022
-
- SUSE Linux Enterprise Desktop
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server
- SUSE Studio
- SUSE Manager
- SUSE Linux Enterprise Real Time Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com