Windows nodes in Reconciling status
This document (000021254) is provided subject to the disclaimer at the end of this document.
Environment
RKE2 custom or node-driver clusters with Windows worker nodes and Windows agents running as service.
Situation
The rke2 agent logged an error similar to the following:
{failed to extract runtime image: open C:\var\lib\rancher\rke2\data\v1.24.9-rke2r2-windows-amd64-bc939b774232\bin\containerd-shim-runhcs-v1.exe: The process cannot access the file because it is being used by another process.}
The Windows rke2 service fails to start due to some child previous processes needing to be finished properly from the previous rke2 service.
Resolution
- Retrieve rke2 agent logs
1.2 Start a Windows PowerShell
1.3 Execute this command to retrieve the logs
Get-EventLog -LogName Application -Source 'rke2' -Newest 500 | format-table -Property TimeGenerated, ReplacementStrings -WrapOutput should be similiar to:
10/26/2023 12:55:39 PM {failed to extract runtime image: open C:\var\lib\rancher\rke2\data\v1.24.9-rke2r2-windows-amd64 -bc939b774232\bin\containerd-shim-runhcs-v1.exe: The process cannot access the file because it is being used by another process.} 10/26/2023 12:55:39 PM {Extracting file bin\containerd-shim-runhcs-v1.exe to C:\var\lib\rancher\rke2\data\v1.24.9-rke2r 2-windows-amd64-bc939b774232\bin\containerd-shim-runhcs-v1.exe} 10/26/2023 12:55:38 PM {Extracting file bin\calico.exe to C:\var\lib\rancher\rke2\data\v1.24.9-rke2r2-windows-amd64-bc939b774232\bin\calico.exe} (redacted) 10/26/2023 12:55:34 PM {Starting rke2 agent v1.24.9+rke2r2 (2f4571a879954e1ea8d4560023eaf57c567df737)}2. Retrieve the Windows RKE2 service status
The process is in a loop, trying to stop and start.
PS C:\> Get-Service rke2 Status Name DisplayName ------ ---- ----------- Start... rke2 rke2 PS C:\> Get-Service rke2 Status Name DisplayName ------ ---- ----------- Stopped rke2 rke23. Stop the rke2 service
PS C:\> Stop-Service rke24. Stop or kill all the child process
4.1 List the processes
PS C:\> Get-Process -Name calico-node,kubelet,containerd*,kube-proxy Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName ------- ------ ----- ----- ------ -- -- ----------- 266 17 27196 40260 1,492.86 5552 0 calico-node 186 11 17548 17132 5.39 4512 0 containerd-shim-runhcs-v1 ...................................................................................4.2 Kill all the processes listed on the previous command
PS C:\> Stop-Process -Name calico-node PS C:\> Stop-Process -Name containerd-shim-runhcs-v14.3 Ensure all of them are not running
5. Start the rke2 service
PS C:\> Start-Service rke2
The rke2 agent service will be start without errors.
Cause
Additional Information
- https://docs.rke2.io/install/quickstart#windows-agent-worker-node-installation
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021254
- Creation Date: 30-Oct-2023
- Modified Date:16-Nov-2023
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com