Making snapshots of cluster nodes in ESXi enviroment safely

This document (000020853) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15 All Releases
SUSE Linux Enterprise High Availability Extension 12 SP5
SUSE Linux Enterprise High Availability Extension 12 SP4

Situation

It is possible to create backups of cluster nodes on ESXi environment using snapshot feature in ESXi.
However this scenario needs to follow the limitations of VM configuration that VMWare has described here:
https://kb.vmware.com/s/article/2151774
The key point in VMWare documentation is that shared storage used in the cluster needs to be excluded outside of snapshot.

If the cluster is used for SAP HANA purpose VMWare states following :

"A note on VMware snapshots: in contrast to VMware clones, which are exact copies of a virtual machine, a VMware snapshot represents the state of a virtual machine at the time it was taken, and it might negatively affect the performance of the virtual machine. This is based on how long it has been in place, the number of snapshots taken, and how much the virtual machine and its guest operating system have changed since the time it was taken. When a snapshot should be taken; for instance, before installing a new SAP HANA patch, take the snapshot and do not select the option “Snapshot the virtual machine’s memory”. The general recommendation is that you shut down the SAP HANA VM prior a snapshot being created."

This has been documented in page 46 of sap_hana_on_vmware_vsphere_best_practices_guide-white-paper.pdf

Resolution

2-node cluster .
Pre steps before initiating the snapshot:

snaphost01:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: snaphost01 (version 2.0.5+20201202.ba59be712-150300.4.21.1-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Fri Nov 11 13:29:31 2022
  * Last change:  Fri Nov 11 13:29:28 2022 by root via cibadmin on snaphost01
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ snaphost01 snaphost02 ]

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Started snaphost01
  * very-important-resource01   (ocf::heartbeat:Dummy):  Started snaphost02

snaphost01:~ # crm node standby snaphost02 
snaphost01:~ # ssh snaphost02
Last login: Fri Nov 11 13:08:25 2022 from 172.16.171.11
snaphost02:~ # crm cluster stop
INFO: Cluster services stopped on snaphost02
snaphost02:~ # exit
logout
Connection to snaphost02 closed.
snaphost01:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: snaphost01 (version 2.0.5+20201202.ba59be712-150300.4.21.1-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Fri Nov 11 13:32:24 2022
  * Last change:  Fri Nov 11 13:31:57 2022 by root via crm_attribute on snaphost01
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Node snaphost02: OFFLINE (standby)
  * Online: [ snaphost01 ]

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Started snaphost01
  * very-important-resource01   (ocf::heartbeat:Dummy):  Started snaphost01

snaphost01:~ #

Now the snaphost02 node is ready for making the snapshot .

Post steps after making the snapshot:

snaphost01:~ # ssh snaphost02
Last login: Fri Nov 11 13:32:08 2022 from 172.16.171.11
snaphost02:~ # crm cluster start
INFO: BEGIN Starting pacemaker(delaying start of sbd for 10s)
INFO: END Starting pacemaker(delaying start of sbd for 10s)
INFO: Cluster services started on snaphost02
snaphost02:~ # crm node online snaphost02
INFO: online node snaphost02
snaphost02:~ # exit
logout
Connection to snaphost02 closed.
snaphost01:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: snaphost01 (version 2.0.5+20201202.ba59be712-150300.4.21.1-2.0.5+20201202.ba59be712) - partition with quorum
  * Last updated: Fri Nov 11 13:38:42 2022
  * Last change:  Fri Nov 11 13:38:35 2022 by root via cibadmin on snaphost02
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ snaphost01 snaphost02 ]

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Started snaphost01
  * very-important-resource01   (ocf::heartbeat:Dummy):  Started snaphost01

snaphost01:~ #

Snapshot has now been created and cluster is back online and fully functional without any hickups .
Similar actions can be done for other node.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.