Troubleshooting a supportconfig Hang
Problem
On occasion the supportconfig will hang when gathering data. The supportconfig is just a bash script that runs system commands. There are times that a command is executed at an inappropriate time, causing a hang condition. However, this is rare. Most of the time, the supportconfig just identifies a problem you already have with a normal system command. So how can you find out which system command supportconfig is hanging on?
Solution
If you want to attempt to skip the hanging command, simply press Ctrl-\ once or twice. If this doesn’t work, you will need to open an additional terminal, and follow the troubleshooting steps below.
When you observe a hang condition, do the following:
- Notice the last message of the supportconfig output on the screen.
- Match the message with the corresponding supportconfig file.
- Look at the last line in the file to see which system command is hanging.
- Finally, run the Binary Check Tool (chkbin) against the command to help troubleshoot the hang.
Let’s step through an example. In this case, the supportconfig hangs gathering Network Time Protocol (NTP) information.
- Notice the last message of the supportconfig output on the screen.
- Match the message with the corresponding supportconfig file.
- Look at the last line in the file to see which system command is hanging.
- Finally, run the Binary Check Tool (chkbin) against the command to help troubleshoot the hang.
Gathering system information Basic Server Health Check... Done RPM Database... Done Basic Environment... Done System Modules... Done Memory Details... Done Disk I/O... Done System Logs... Done YaST Files... Done File System List... Skipped Crash Info... Skipped NTP...
“NTP…” is the last message on the screen.
Most of the supportconfig text filenames match closely with the displayed message; like “RPM Database” matches to rpm.txt, and “Crash Info” to crash.txt. You can also grep the supportconfig script itself for the filename.
# grep -A2 'NTP...' /sbin/supportconfig printlog "NTP..." test $OPTION_NTP -eq 0 && { echolog Excluded; return 1; } OF=ntp.txt if rpm_verify $OF xntp
The OF=ntp.txt shows the supportconfig uses ntp.txt for it’s NTP information. You can also see the OPTION_NTP variable is used to exclude all NTP information. If you wanted to bypass the hang, you could change the OPTION_NTP=1 to OPTION_NTP=0 in the /etc/supportconfig.conf to exclude NTP information. Get a complete supportconfig once you exclude the problematic section.
The supportconfig creates a directory in /var/log as it’s gathering information. Upon successful completion, it tars up the directory and then deletes it. Since the supportconfig hung, the directory should still be in /var/log with the format /var/log/nts_hostname_date_time.
larktop:~ # cd /var/log/nts_larktop_080205_2246/ larktop:/var/log/nts_larktop_080205_2246 # tail ntp.txt #==[ Command ]======================================# # /sbin/chkconfig ntp --list ntp 0:off 1:off 2:on 3:on 4:off 5:on 6:off #==[ Command ]======================================# # /etc/init.d/ntp status Checking for network time protocol daemon (NTPD): ..unused #==[ Command ]======================================# # /usr/sbin/ntpq -p
The last command to be executed prior to the hang is ntpq.
#--[ Checking File Ownership ]-----------------------# /usr/sbin/ntpq - from RPM: xntp-4.2.0a-70.14 :Shell script #--[ Validating Unique RPMs ]------------------------# Validating RPM: xntp-4.2.0a-70.14 [ Warning ] S.5....T c /etc/ntp.conf S.5....T /usr/sbin/ntpq
The RPM validation shows that the size, md5sum and time stamps have all changed on the ntpq executable. It also says ntpq is a shell script. Something is very wrong, since ntpq is supposed to be a dynamically linked executable. The best course of action is to reinstall the xntp RPM package.
Related Articles
Jun 22nd, 2023
Add more power to Prometheus
Feb 15th, 2023
No comments yet