Troubleshooting the SLES10 Boot Process
Overview
When a server fails to boot, a critical situation is at hand. The purpose of this document is to provide a quick reference guide to narrow down the cause of a failed boot and get the server back up as quickly as possible. It is based on SUSE Linux Enterprise Server 10 (SLES10).
Troubleshooting Procedure
- The primary troubleshooting objective is to narrow down where in the boot process the failure occurred.
- The boot process is summarized below. For more details, refer to the Troubleshooting Table below.
- Look at the failed server’s screen for the last on-screen landmark that matches the troubleshooting table’s "On-Screen Landmarks".
- Once you determine how far in the boot process the failure occurred, look at the troubleshooting table’s associated files and troubleshooting/potential fixes.
- The two most identifiable on-screen landmarks are:
- The grub boot menu screen (Troubleshooting Table, Line 3)
- Seeing the word "done" scrolling across the screen (Troubleshooting Table, Lines 8 and 11)
- The purpose of boot installed system, run level 1 and chroot installed system is to get the server in an operational maintenance state, so further problem resolution can be completed.
- Boot Installed System (BIS) Procedure
- If this procedure works, then the problem is most likely on lines 1-6 of the troubleshooting table.
- Boot from CD1
- Select "Installation"
- Select your Language
- Accept the License Agreement
- Click "Other"
- Select "Boot Installed System"
- Click "OK"
- Boot to Run Level 1
- Run level 1 is very similar to chroot installed system (CIS), but the kernel does it for you. You also have access to yast and the proc filesystem. So, run level 1 is preferred over CIS.
- Append "init 1" to the boot options line of the default boot kernel (ie SUSE Linux Enterprise Server 10)
- Type root’s password
- If you need network access, just use yast to configure it
- chroot Installed System (CIS) Procedure
- Used mostly in lines 7-14 of the troubleshooting table.
- Boot from CD1
- Select "Rescue System", Rescue login: root
- Your first goal is to find and mount the root "/" partition, so we can see /etc/fstab
- Run cat /proc/partitions to find the disk devices the OS sees
- For each device, display the partition table
ls-boot:~ # parted -s /dev/sda print Disk geometry for /dev/sda: 0kB - 2147MB Disk label type: msdos Number Start End Size Type File system Flags 1 32kB 214MB 214MB primary ext2 boot, type=83 2 214MB 535MB 321MB primary linux-swap type=82 3 535MB 2147MB 1612MB extended lba, type=0f 5 535MB 1012MB 477MB logical reiserfs type=83 6 1012MB 1596MB 584MB logical reiserfs type=83 7 1596MB 2147MB 551MB logical reiserfs type=83
- You can ignore type 82 swap and type 0f extended partitions
- To find the root partition, you may need to just guess. For example,
- mount /dev/sda1 /mnt
- ls -l /mnt
- If the /mnt directory listing shows /etc and /root, then its the root partition
- Repeat these steps for each device until you find root. In this case, the root device is /dev/sda6
- mount /dev/sda6 /mnt
BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login
yast lan > Next > Edit > Next > Next/Finish
- Mount all additional file systems relative to /mnt
- Run cat /mnt/etc/fstab
Rescue# cat /mnt/etc/fstab /dev/sda6 / reiserfs acl,user_xattr 1 1 /dev/sda1 /boot ext2 acl,user_xattr 1 2 /dev/sda7 /usr reiserfs acl,user_xattr 1 2 /dev/sda5 /var reiserfs acl,user_xattr 1 2 /dev/sda2 swap swap defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs noauto 0 0 debugfs /sys/kernel/debug debugfs noauto 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 /dev/fd0 /media/floppy auto noauto,user,sync 0 0
- This shows the system devices and their mount points.
- Mount all additional file systems, for example.
mount /dev/sda1 /mnt/boot mount /dev/sda5 /mnt/var mount /dev/sda7 /mnt/usr
- Rebind proc, sysfs and dev
mount --rbind /proc /mnt/proc mount --rbind /sys /mnt/sys mount --rbind /dev /mnt/dev
- chroot to the mounted installed system. The chroot command remaps /mnt as root "/".
chroot /mnt
- If this command fails, then you need to confirm that /mnt/bin/bash and glibc on the installed system are valid.
- To return to the rescue system, type exit.
Troubleshooting Table
BIS = Boot Installed System Procedure
CIS = chroot Installed System Procedure
Boot Process | Associated File(s) | On-Screen Landmarks | Troubleshooting / Potential Fixes | |
1 | BIOS | N/A | BIOSMessages | Update the firmware Make sure a disk device is marked bootable |
2 | MBR | /boot/grub/stage1 | GRUB loading stage2… |
BIS grub-install /dev/<disk> or lilo -v |
3 | GRUB | /boot/grub/stage2 /boot/grub/menu.lst |
GRUB menu or grub> prompt | BIS grub-install /dev/<disk> or lilo -v Check /boot/grub/menu.lst |
4 | kernel | /boot/vmlinuz | Hardware info scrolling RAMDISKdriver initialized: |
BIS Reinstall kernel rpm |
5 | initrd | /boot/initrd /etc/sysconfig/kernel |
RAMDISK: <relevant message> | BIS mkdir -p /tmp/ramdisk; cd /tmp/ramdisk; zcat /boot/initrd | cpio-ivd mkinitrd lilo -v |
6 | ramdisk:init | /init in /boot/initrd /etc/sysconfig/kernel |
Starting udevd Creating devices Loading <module_name> There will be a "Loading" statement for each module defined in the /etc/sysconfig/kernel INITRD_MODULES variable. |
BIS mkinitrd creates the ramdisk:init file. |
7 | sbin:init | /sbin/init /etc/inittab |
INIT: version 2.85 booting | init 1, then CIS
Use boot options init=/bin/bash or init=/bin/sash to bypass running /sbin/init. |
8 | sbin:init:boot | /bin/bash /etc/init.d/boot /etc/init.d/boot.d/* |
System Boot Control: Running /etc/init.d/boot Each service shows: done,failed or skipped System Boot Control: The system has been setup |
init s or init 1 starts the minimum services CIS start no services To step through or stop the boot process from this point on, edit /etc/sysconfig/boot and change to: PROMPT_FOR_CONFIRM="yes" RUN_PARALLEL="no" FLOW_CONTROL="yes" (Ctrl-S stops, Ctrl-Q resumes) |
9 | sbin:init:boot | /etc/init.d/boot.local | System Boot Control: Running /etc/init.d/boot.local | init 1, then CIS |
10 | sbin:init | /etc/inittab | INIT: Entering runlevel: 3 | init 1, then CIS |
11 | sbin:init:rc | /bin/bash /etc/init.d/rc /etc/init.d/rc?.d/* |
Master Resource Control: previous runlevel:N, switching to runlevel: 3 Each service shows: done, failed or skipped Master Resource Control: runlevel 3 has been reached Skipped services in runlevel 3: |
init s or init 1, then CIS |
12 | sbin:init | /etc/inittab | N/A | init 1, then CIS init uses /etc/inittab to know how to run the login programs. |
13 | sbin:init:mingetty | /etc/issue /sbin/mingetty |
Welcome to SUSE LINUX… login: |
init 1 bypasses mingetty CIS |
14 | sbin:init:X | Graphical login screen | init 1 bypasses X login CIS |
If you don’t know what to do next, and BIS or CIS work, you can always run
rpm -Vf </path/to/file>
for each file listed in the "Associated File(s)" column.
(Visited 14 times, 1 visits today)
Related Articles
Aug 24th, 2023
Calling customers: research responses required 💚
Jun 29th, 2023
Comments
very useful document for trouble shooting purpose
verry helpfull