Troubleshooting the SLES10 Boot Process

Share
Share

Overview

When a server fails to boot, a critical situation is at hand. The purpose of this document is to provide a quick reference guide to narrow down the cause of a failed boot and get the server back up as quickly as possible. It is based on SUSE Linux Enterprise Server 10 (SLES10).

Troubleshooting Procedure

  1. The primary troubleshooting objective is to narrow down where in the boot process the failure occurred.
  2. The boot process is summarized below. For more details, refer to the Troubleshooting Table below.
  3. BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login
  4. Look at the failed server’s screen for the last on-screen landmark that matches the troubleshooting table’s "On-Screen Landmarks".
  5. Once you determine how far in the boot process the failure occurred, look at the troubleshooting table’s associated files and troubleshooting/potential fixes.
  6. The two most identifiable on-screen landmarks are:
    1. The grub boot menu screen (Troubleshooting Table, Line 3)
    2. Seeing the word "done" scrolling across the screen (Troubleshooting Table, Lines 8 and 11)
  7. The purpose of boot installed system, run level 1 and chroot installed system is to get the server in an operational maintenance state, so further problem resolution can be completed.
  8. Boot Installed System (BIS) Procedure
    1. If this procedure works, then the problem is most likely on lines 1-6 of the troubleshooting table.
    2. Boot from CD1
    3. Select "Installation"
    4. Select your Language
    5. Accept the License Agreement
    6. Click "Other"
    7. Select "Boot Installed System"
    8. Click "OK"
  9. Boot to Run Level 1
    1. Run level 1 is very similar to chroot installed system (CIS), but the kernel does it for you. You also have access to yast and the proc filesystem. So, run level 1 is preferred over CIS.
    2. Append "init 1" to the boot options line of the default boot kernel (ie SUSE Linux Enterprise Server 10)
    3. Type root’s password
    4. If you need network access, just use yast to configure it
  10. yast lan > Next > Edit > Next > Next/Finish
  11. chroot Installed System (CIS) Procedure
    1. Used mostly in lines 7-14 of the troubleshooting table.
    2. Boot from CD1
    3. Select "Rescue System", Rescue login: root
    4. Your first goal is to find and mount the root "/" partition, so we can see /etc/fstab
      1. Run cat /proc/partitions to find the disk devices the OS sees
      2. For each device, display the partition table
    ls-boot:~ # parted -s /dev/sda print
    Disk geometry for /dev/sda: 0kB - 2147MB
    Disk label type: msdos
    Number  Start   End     Size    Type      File system  Flags
    1       32kB    214MB   214MB   primary   ext2         boot, type=83
    2       214MB   535MB   321MB   primary   linux-swap   type=82
    3       535MB   2147MB  1612MB  extended               lba, type=0f
    5       535MB   1012MB  477MB   logical   reiserfs     type=83
    6       1012MB  1596MB  584MB   logical   reiserfs     type=83
    7       1596MB  2147MB  551MB   logical   reiserfs     type=83
    
      1. You can ignore type 82 swap and type 0f extended partitions
      2. To find the root partition, you may need to just guess. For example,
        1. mount /dev/sda1 /mnt
        2. ls -l /mnt
        3. If the /mnt directory listing shows /etc and /root, then its the root partition
        4. Repeat these steps for each device until you find root. In this case, the root device is /dev/sda6
        5. mount /dev/sda6 /mnt

    1. Mount all additional file systems relative to /mnt
      1. Run cat /mnt/etc/fstab
    Rescue# cat /mnt/etc/fstab
    /dev/sda6   /                   reiserfs   acl,user_xattr   1 1
    /dev/sda1   /boot               ext2       acl,user_xattr   1 2
    /dev/sda7   /usr                reiserfs   acl,user_xattr   1 2
    /dev/sda5   /var                reiserfs   acl,user_xattr   1 2
    /dev/sda2   swap                swap       defaults         0 0
    proc        /proc               proc       defaults         0 0
    sysfs       /sys                sysfs      noauto           0 0
    debugfs     /sys/kernel/debug   debugfs    noauto           0 0
    devpts      /dev/pts            devpts     mode=0620,gid=5  0 0
    /dev/fd0    /media/floppy       auto       noauto,user,sync 0 0
      1. This shows the system devices and their mount points.
      2. Mount all additional file systems, for example.
    mount /dev/sda1 /mnt/boot
    mount /dev/sda5 /mnt/var
    mount /dev/sda7 /mnt/usr
    
    1. Rebind proc, sysfs and dev
    mount --rbind /proc /mnt/proc
    mount --rbind /sys /mnt/sys
    mount --rbind /dev /mnt/dev
    
    1. chroot to the mounted installed system. The chroot command remaps /mnt as root "/".
    chroot /mnt
    
    1. If this command fails, then you need to confirm that /mnt/bin/bash and glibc on the installed system are valid.
    2. To return to the rescue system, type exit.

Troubleshooting Table

BIS = Boot Installed System Procedure
CIS = chroot Installed System Procedure

Boot Process Associated File(s) On-Screen Landmarks Troubleshooting / Potential Fixes
1 BIOS N/A BIOSMessages Update the firmware
Make sure a disk device is marked bootable
2 MBR /boot/grub/stage1 GRUB
loading stage2…
BIS
grub-install /dev/<disk> or lilo -v
3 GRUB /boot/grub/stage2
/boot/grub/menu.lst
GRUB menu or grub> prompt BIS
grub-install /dev/<disk> or lilo -v
Check /boot/grub/menu.lst
4 kernel /boot/vmlinuz Hardware info scrolling
RAMDISKdriver initialized:
BIS
Reinstall kernel rpm
5 initrd /boot/initrd
/etc/sysconfig/kernel
RAMDISK: <relevant message> BIS
mkdir -p /tmp/ramdisk; cd /tmp/ramdisk; zcat /boot/initrd | cpio-ivd
mkinitrd
lilo -v
6 ramdisk:init /init in /boot/initrd
/etc/sysconfig/kernel
Starting udevd
Creating devices
Loading <module_name>

There will be a "Loading" statement for each module defined in the /etc/sysconfig/kernel INITRD_MODULES variable.

BIS
mkinitrd creates the ramdisk:init file.
7 sbin:init /sbin/init
/etc/inittab
INIT: version 2.85 booting init 1, then CIS

Use boot options init=/bin/bash or init=/bin/sash to bypass running /sbin/init.

8 sbin:init:boot /bin/bash
/etc/init.d/boot
/etc/init.d/boot.d/*
System Boot Control: Running /etc/init.d/boot
Each service shows: done,failed or skipped
System Boot Control: The system has been setup
init s or init 1 starts the minimum services
CIS start no services
To step through or stop the boot process from this point on, edit /etc/sysconfig/boot and change to:

PROMPT_FOR_CONFIRM="yes"
RUN_PARALLEL="no"
FLOW_CONTROL="yes"
  (Ctrl-S stops, Ctrl-Q resumes)
9 sbin:init:boot /etc/init.d/boot.local System Boot Control: Running /etc/init.d/boot.local init 1, then CIS
10 sbin:init /etc/inittab INIT: Entering runlevel: 3 init 1, then CIS
11 sbin:init:rc /bin/bash
/etc/init.d/rc
/etc/init.d/rc?.d/*
Master Resource Control: previous runlevel:N, switching to runlevel: 3
Each service shows: done, failed or skipped
Master Resource Control: runlevel 3 has been reached
Skipped services in runlevel 3:
init s or init 1, then CIS
12 sbin:init /etc/inittab N/A init 1, then CIS
init uses /etc/inittab to know how to run the login programs.
13 sbin:init:mingetty /etc/issue
/sbin/mingetty
Welcome to SUSE LINUX…
login:
init 1 bypasses mingetty
CIS
14 sbin:init:X Graphical login screen init 1 bypasses X login
CIS

If you don’t know what to do next, and BIS or CIS work, you can always run

rpm -Vf </path/to/file>

for each file listed in the "Associated File(s)" column.

Share
(Visited 11 times, 1 visits today)

Comments

  • Avatar photo AustinJoseph says:

    very useful document for trouble shooting purpose

  • Avatar photo Anonymous says:

    verry helpfull

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar photo
    11,712 views