System crash during Oracle Grid installation
This document (000019762) is provided subject to the disclaimer at the end of this document.
Environment
Situation
crash> bt PID: 19424 TASK: ffff925e2ce04d00 CPU: 1 COMMAND: "kfod.bin" #0 [ffffa3840ff23a70] machine_kexec at ffffffffac86d6b1 #1 [ffffa3840ff23ac8] __crash_kexec at ffffffffac9531a5 #2 [ffffa3840ff23b90] crash_kexec at ffffffffac953fbd #3 [ffffa3840ff23ba8] oops_end at ffffffffac8354df #4 [ffffa3840ff23bc8] no_context at ffffffffac87de6f #5 [ffffa3840ff23c30] do_page_fault at ffffffffac87f0a0 #6 [ffffa3840ff23c60] page_fault at ffffffffad20122e [exception RIP: lock_timer_base+0x4e] RIP: ffffffffac92d7ae RSP: ffffa3840ff23d18 RFLAGS: 00010246 RAX: 000000000001da80 RBX: 000000000ff23da9 RCX: 0000000000000000 RDX: 0000000000023da9 RSI: ffffa3840ff23d50 RDI: ffffa3840ff23da8 RBP: ffffa3840ff23da8 R8: 0000000000000000 R9: 0000000000000000 R10: ffffa3840ff23ed0 R11: 0000000000000000 R12: ffffffffad9a1980 R13: 000000000001da80 R14: ffffa3840ff23d50 R15: ffffa3840ff23d98 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffa3840ff23d48] try_to_del_timer_sync at ffffffffac92ea76 #8 [ffffa3840ff23d70] del_timer_sync at ffffffffac92eb11 #9 [ffffa3840ff23d80] asm_do_io at ffffffffc072d779 [oracleasm] #10 [ffffa3840ff23e28] asmfs_svc_io64 at ffffffffc072d8ff [oracleasm] #11 [ffffa3840ff23ec8] vfs_read at ffffffffacacf6a9 #12 [ffffa3840ff23ef8] ksys_read at ffffffffacacfa31 #13 [ffffa3840ff23f38] do_syscall_64 at ffffffffac8052eb #14 [ffffa3840ff23f50] entry_SYSCALL_64_after_hwframe at ffffffffad20008c RIP: 00007fd4a788ae61 RSP: 00007ffecfdb2aa8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007ffecfdb2ad0 RCX: 00007fd4a788ae61 RDX: 0000000000000050 RSI: 00007ffecfdb2ad0 RDI: 0000000000000007 RBP: 00007ffecfdb2fe8 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000034 R11: 0000000000000246 R12: 00007fd4aea953e8 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000001b75d40 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b crash> dis -rl ffffffffac92d7ae|tail 0xffffffffac92d795 <lock_timer_base+53>: test $0x40000,%ebx 0xffffffffac92d79b <lock_timer_base+59>: jne 0xffffffffac92d790 <lock_timer_base+48> /usr/src/debug/kernel-default-5.3.18-24.12.1.x86_64/linux-5.3/linux-obj/../kernel/time/timer.c: 835 0xffffffffac92d79d <lock_timer_base+61>: mov %ebx,%edx 0xffffffffac92d79f <lock_timer_base+63>: mov %r13,%rax 0xffffffffac92d7a2 <lock_timer_base+66>: and $0x3ffff,%edx /usr/src/debug/kernel-default-5.3.18-24.12.1.x86_64/linux-5.3/linux-obj/../kernel/time/timer.c: 841 0xffffffffac92d7a8 <lock_timer_base+72>: test $0x80000,%ebx /usr/src/debug/kernel-default-5.3.18-24.12.1.x86_64/linux-5.3/linux-obj/../kernel/time/timer.c: 835 0xffffffffac92d7ae <lock_timer_base+78>: mov (%r12,%rdx,8),%rdxThe failing instruction is on line 835:
# kernel/time/timer.c 833 static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu) 834 { 835 struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu); 836 837 /* 838 * If the timer is deferrable and NO_HZ_COMMON is set then we need 839 * to use the deferrable base. 840 */ 841 if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE)) 842 base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu); 843 return base; 844 }The calls chain:
->lock_timer_base(timer, &flags) ->get_timer_base(tf) ->get_timer_cpu_base() ->per_cpu_ptr(&timer_bases[BASE_STD], cpu)The timer_list in hand:
crash> timer_list ffffa3840ff23da8 struct timer_list { entry = { next = 0xffff925b6bdcee30, pprev = 0x0 }, expires = 18446623520082161200, function = 0xffffffffc0729e00, flags = 267533737 }Which point to timeout_func() of oracleasm module:
crash> sym 0xffffffffc0729e00 ffffffffc0729e00 (t) timeout_func [oracleasm]
Resolution
A potential workaround would be:
- Create the ASM Devices with oracleasm createdisk.
- Stop asmlib:
systemctl stop oracleasm
- Change the Owner and Group of the Device:
chown grid:asmadmin /dev/sda5
- Start the Grid setup:
su grid; cd $ORACLE_HOME; ./gridsetup.sh
- At this point /dev/sda5 disk will be discovered, a disk group can be created and the ASM Instance installation can continue further.
- After, a reboot of the machine is needed. The ASM-Instance should start without problems and at this point /dev/oracleasm/ASMDISK1 device can be used.
- Kfod should be also run without any problems.
Cause
Status
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019762
- Creation Date: 27-Oct-2020
- Modified Date:22-Mar-2023
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com