SOLVED

Starting VMs may fail after installation of 10-2023 CU when Veeam RCT is being used -SOLVED-

Steel Contributor

Dear community,

Please consider reading this article Virtual machines failed to start after installing Oct 2023 Update (KB5031364) - Microsoft Q&A

 

in case you are directly affected by the issue, that Hyper-V VMs fail to start, in the certain scenario described below.

The issue is not related to VMs not starting on VMware ESXi after applying current CUs. Find remediation instructions for VMware here.


Affected Hosts:
Hyper-V Hosts
OS:
Windows Server 2019 (by community)
Windows Server 2022 (by community)
Azure Stack HCI 22H2 (experienced myself)


Scenario:

Gen1 / Gen2 VMs backed up by Veeam using RCT (Change Block Tracking)
Not completely confirmed but it seems that unclustered VMs are affected, and clustered VMs are not affected. In a case this morning this caused the unclustered Domain Controllers dedicated for the Azure Stack HCI cluster to fail.


edit: 19. Oct. 2023: clustered and unclustered VMs are affected.

Reasons:

Microsoft has made changes, fixing a long-standing performance issue with Veeam RCT feature, but the fix this can cause a side-effect. The fix is not mentioned in the change log (Windows Update history), despite the criticality of the improvement.

"stephc_msft wrote: Oct 09, 2023 10:15 pmThe long awaited 'RCT fix', for the RCT side of this ongoing issue, should be released in the October 2023 Windows update, ie tomorrow!"

example of perf. improvements during Veeam backups using RCT from the community:

Karl_WesterEbbinghaus_business_0-1697539893978.png


Root cause:
directly related to Veeam RCT and CU 10-2023, yet unknown why it happens.

Reproducible:

yes, 100% in certain scenarios.

The Azure Stack HCI community is voluntarily working on the reproducibility in more scenarios to understand the dependencies.

Remediation:
a. rename or delete MCT / RCT files (quick and dirty). Keep the files if you want Microsoft Support / Veeam Support to engage on this issue.
b. uninstall the CU 10-2023 from your cluster (I cannot recommend that for several reasons), using manual suspending of nodes, WUSA and monitor migration and storage replication jobs to be finished before and after restarting nodes.

Solution:

Deploy Cumulative Update 11-2023, instead of 10-2023 on affected Hyper-V hosts.

kudos: @Ernie Costa @Jaromir Kaspar @Darryl van der Peijl 

4 Replies

If you are affected, to avoid clutter please consult the linked article for latest developments in with this issue, plus existing SR you can use as a reference case, so Microsoft Support does not waste a lot of its precious time for no reason.

Hi, it's been a week since the issue was surfaced. Are you aware if there is any update or fixes with the October patches from Microsoft?

@swong9881 following the original thread there is a chance for a fix in the November CU. Please follow the discussion in the linked thread. I tried to avoid posting follow-up here, in order to avoid fragmentation of feedback on the issue.

 

Is this helpful to you? 

best response confirmed by Karl_Wester-Ebbinghaus_business (Steel Contributor)
Solution
Solution:
Deploy Cumulative Update 11-2023, instead of 10-2023 on affected Hyper-V hosts.