multipath/lvm boot issues with few specific storage devices - "failed to read sysfs vpd pg80"
This document (7023205) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12 Service Pack 3
SUSE Linux Enterprise Server for SAP Applications Service Pack 3
Situation
Some systems attaching to multipath storage devices may have issues with certain versions of SLES 12.
Currently issues have been observed on Hitachi OPEN-V and HP OPEN-V devices.
However, the underlying issues may affect other devices.
See the "Cause" and "Additional Information" sections for more information.
During the installation the installer will ask if multipath support should be enabled, even if the user chooses to install multipath support at this point, the system may not boot with multipath properly working.
If the multipath targets contain LVM volumes this may lead to additional issues which result in boot failures.
In cases involving LVM, symptoms of such issues will include:
The 'pvs' output shows "duplicate PV" entries, single path devices rather than multipath entries.
"multipathd" reports "unknown orphan disks" and the "multipath verbose" output shows "failed to read sysfs vpd pg80", "lsblk" reports incorrect multipath installation data.
Resolution
* Workaround for fresh OS installation:
As temporary workaround for OS installation, the following generic parameter should be added to the kernel boot option line:
or alternatively for systems having issues with only Hitachi OPEN-V enclosures:
or for systems having issues only with HP OPEN-V enclosures:
After the installation is completed, it is recommended to update the kernel to:
After the kernel update the scsi_mod boot option can be removed from the bootline.
There may be other enclosures where the same issue could be encountered.
In such instances either use the generic boot option:
or find the vendor and model identifiers of the enclosure and build a boot option of the format:
* Workaround for an already installed OS:
Use one of the device specific workaround option e.g.:
for Hitachi OPEN-V enclosures and then you may update to the latest maintenance web kernel i.e 4.4.103-6.33.1 and above.
Cause
For backwards compatibility reasons, the Hitachi and HP OPEN-V devices pretend to be an older version SCSI device which doesn't support VPD pages 0x80 and 0x83, while in reality they do.
SUSE changed the udev rule processing for SCSI device identification during Service Pack 2 (with the sg3_utils update to version 1.43-12.1) to avoid unnecessary, possibly blocking device IO during udev rule processing.
The udev rule evaluation uses the device inquiry and VPD information to add properties to the udev disk objects indicating that they are multipath enabled.
The kernel exposes the device identification information in the "inquiry" and "vpd_pg83", "vpd_pg80" attributes in sysfs. In order to do this, it must send various scsi commands to the devices to request this information.
In order to avoid confusing older devices, these commands will only be sent if the device is known to support them.
The kernel ascertains this by checking that the device’s scsi revision level is high enough, or by seeing that the scsi devinfo table contains an attribute indicating the device supports these commands.
Older kernels had an issue with the parsing of the devinfo table for entries which used ‘wildcard’ mechanics. This caused an issue for the Hitachi OPEN-V enclosures.
Subsequent to that it was discovered the HP OPEN-V enclosure did not have the proper attributes flags set.
The same issue may happen to other devices (with the current kernel) that do not have a proper entry in the devinfo table. Or (on older kernels) for any device using a ‘wildcard’ format entry in the devinfo table.
Additional Information
The /drivers/scsi/scsi_devinfo.c scsi_static_device_list is used to convey information about some scsi devices to the kernel.
This is necessary since this information can not be ascertained in a fashion which is guaranteed to be safe across all scsi devices.
In order for the kernel to determine if the VPD pages 0x80 and 0x83 are provided by these devices, the devinfo flag BLIST_TRY_VPD_PAGES (0x10000000) attribute is added to their entry.
With the sysfs approach introduced in the SLES12 SP2 sg3_utils update, a few changes and patches were necessary to make sure VPD is read correctly.
The SLES12 SP3 GA sg3_utils includes the following patch:
Assuming the kernel had already scanned the VPD pages and determined if the vpd page attribute is present, a patch to skip reading those devices (when VPD 0x83 isn't supported) was added to avoid unnecessary I/O.
Unfortunately this doesn't work for the OPEN-V devices due to an error in the devinfo parsing of older kernels. Likewise the entry for the HP OPEN-V devices contained an improper set of attributes on older kernels.
A workaround patch was added to fallback to sg_inq --page when VPD cannot be accessed. It was added to the sg3_utils update sg3_utils-1.43-16.5.1, released September/2017.
However with the kernel update 4.4.103-6.33.1 and the device flags built in we can rely on the kernel to detect the VPD pages and a future sg3_utils update will revert the workaround patch.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com