System panic on cifs_reconnect() when the SMB server is down.
This document (000019780) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 15 GA
SUSE Linux Enterprise Server 12 SP5
SUSE Linux Enterprise Server 12 SP4
Situation
Before the panic the following cifs relates errors are logged:
CIFS: VFS: \\FS0.GANDALF.LOCAL has not responded in 30 seconds. Reconnecting... CIFS VFS: \\FS0.GANDALF.LOCAL cifs_reconnect: no target servers for DFS failover BUG: unable to handle kernel paging request at fffffffffffffff8 IP: cifs_reconnect+0x4a5/0xdf0 [cifs]
The panic has been triggered by cifsd task hitting a NULL pointer on cifs_reconnect() during cifs tcp session reconnection, as (struct dfs_cache_tgt_list) tgt_list has not been initialized correctly because no cache entry was found for a given DFS referral path. The panic task stack trace:
PID: 24194 TASK: ffff9644560f55c0 CPU: 6 COMMAND: "cifsd" #0 [ffffb3eb959afa58] machine_kexec at ffffffffae05ec22 #1 [ffffb3eb959afaa8] __crash_kexec at ffffffffae122b2a #2 [ffffb3eb959afb68] crash_kexec at ffffffffae123b59 #3 [ffffb3eb959afb80] oops_end at ffffffffae02e0a1 #4 [ffffb3eb959afba0] no_context at ffffffffae06ed4b #5 [ffffb3eb959afbf0] __do_page_fault at ffffffffae06f22c #6 [ffffb3eb959afc58] do_page_fault at ffffffffae06f68b #7 [ffffb3eb959afc80] page_fault at ffffffffae8016b5 [exception RIP: cifs_reconnect+0x4a5] RIP: ffffffffc05b5b65 RSP: ffffb3eb959afd38 RFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff963e07cac000 RCX: 0000000000000001 RDX: fffffffffffffff8 RSI: ffffb3eb959afd70 RDI: 0000000000000286 RBP: 00000000ffffff8d R8: 0000000000000001 R9: 0000000000000000 R10: 0000000104be6900 R11: 0000000000000000 R12: ffff963e07cac1c0 R13: fffffffffffffff8 R14: ffff9648d0b64400 R15: ffffb3eb959afd58 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffb3eb959afdb0] cifs_readv_from_socket at ffffffffc05b6627 [cifs] #9 [ffffb3eb959afde8] cifs_read_from_socket at ffffffffc05b672d [cifs] #10 [ffffb3eb959afe60] cifs_demultiplex_thread at ffffffffc05b6a64 [cifs] #11 [ffffb3eb959aff10] kthread at ffffffffae0aa40d #12 [ffffb3eb959aff50] ret_from_fork at ffffffffae800235
De-referencing the failing instruction:
crash> dis -rl ffffffffc0722be5|tail 0xffffffffc0722bc0 <cifs_reconnect+0x480>: je 0xffffffffc0722ff4 <cifs_reconnect+0x8b4> /usr/src/debug/kernel-default-4.12.14/linux-4.12/linux-obj/../fs/cifs/connect.c: 409 0xffffffffc0722bc6 <cifs_reconnect+0x486>: testb $0x1,0xcb037(%rip) # 0xffffffffc07edc04 <cifsFYI> 0xffffffffc0722bcd <cifs_reconnect+0x48d>: je 0xffffffffc0722bdc <cifs_reconnect+0x49c> 0xffffffffc0722bcf <cifs_reconnect+0x48f>: testb $0x4,0xc23ec(%rip) # 0xffffffffc07e4fc2 <descriptor.82423+0x22> 0xffffffffc0722bd6 <cifs_reconnect+0x496>: jne 0xffffffffc07231ab <cifs_reconnect+0xa6b> /usr/src/debug/kernel-default-4.12.14/linux-4.12/linux-obj/../fs/cifs/dfs_cache.h: 93 0xffffffffc0722bdc <cifs_reconnect+0x49c>: test %r13,%r13 0xffffffffc0722bdf <cifs_reconnect+0x49f>: je 0xffffffffc0722fd8 <cifs_reconnect+0x898> 0xffffffffc0722be5 <cifs_reconnect+0x4a5>: mov 0x0(%r13),%rbp
The failing instruction on line 93:
#./fs/cifs/dfs_cache.h 90 static inline const char * 91 dfs_cache_get_tgt_name(const struct dfs_cache_tgt_iterator *it) 92 { 93 return it ? it->it_name : NULL; 94 }
The null pointer is encountered while trying to dereference it->it_name:
crash> struct dfs_cache_tgt_iterator -ox struct dfs_cache_tgt_iterator { [0x0] char *it_name; [0x8] struct list_head it_list; } 0xffffffffc0722be5 <cifs_reconnect+0x4a5>: mov 0x0(%r13),%rbp As %r13 holds an invalid address (0xfffffffffffffff8). #./fs/cifs/connect.c ----------------------- 390 static void reconn_inval_dfs_target(struct TCP_Server_Info *server, 391 struct cifs_sb_info *cifs_sb, 392 struct dfs_cache_tgt_list *tgt_list, 393 struct dfs_cache_tgt_iterator **tgt_it) 394 { 395 const char *name; 396 397 if (!cifs_sb || !cifs_sb->origin_fullpath || !tgt_list || 398 !server->nr_targets) 399 return; 400 401 if (!*tgt_it) { 402 *tgt_it = dfs_cache_get_tgt_iterator(tgt_list); 403 } else { 404 *tgt_it = dfs_cache_get_next_tgt(tgt_list, *tgt_it); 405 if (!*tgt_it) 406 *tgt_it = dfs_cache_get_tgt_iterator(tgt_list); 407 } 408 409 cifs_dbg(FYI, "%s: UNC: %s\n", __func__, cifs_sb->origin_fullpath); 410 411 name = dfs_cache_get_tgt_name(*tgt_it); ... ... ...
The call chain:
cifs_reconnect(struct TCP_Server_Info *server) reconn_inval_dfs_target(server, cifs_sb, &tgt_list, &tgt_it) dfs_cache_get_tgt_name(const struct dfs_cache_tgt_iterator *it)
Resolution
- SLES 15 SP1 - 4.12.14-197.56.1
- SLES 15 GA - 4.12.14-150.58.1
- SLES 12 SP5 - 4.12.14-122.37.1
- SLES 12 SP4 - 4.12.14-95.60.1
Cause
dfs_cache_tgt_list) tgt_list has not been initialized correctly because no cache entry was found for a given DFS referral path.
Status
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019780
- Creation Date: 11-Nov-2020
- Modified Date:11-Nov-2020
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com