Windows Server Summit 2024
Mar 26 2024 08:00 AM - Mar 28 2024 04:30 PM (PDT)
Microsoft Tech Community

Windows Server 2022 - devices not booting when Secure Boot enabled (KB5022842)

Iron Contributor

The most recent patch Tuesday update for Server 2022 - KB5022842 - causes some devices with Secure Boot enabled to fail to boot - it reboots after the update, then fails at the next reboot. The Microsoft documentation claims that it's only causing issues with VMs running on ESXi 7.0 and below:

 

https://support.microsoft.com/en-gb/topic/february-14-2023-kb5022842-os-build-20348-1547-be155955-29...

https://learn.microsoft.com/en-us/windows/release-health/status-windows-server-2022#3017msgdesc

 

The second of the articles linked above states:

Resolution: This issue is resolved in VMware ESXi 7.0 U3k, released on February 21st 2023. No update from Microsoft is needed for this issue.

 

The VMWare patch (https://kb.vmware.com/s/article/90947) does resolve the issue in VMWare VMs, but what Microsoft appears to be ignoring is that a number of bare-metal installs of Server 2022 are also affected. See here, inc;uding comments at the bottom.

 

https://borncity.com/win/2023/02/20/windows-server-2022-feb-2023-patchday-secure-boot-issues-also-on...

 

From my own testing and corresponding with others, it appears that most (possibly all) Poweredge 13G servers are affected - certainly aware that the Poweredge T430, R530 and R730 are, with the latest firmware installed. I have a support case open with Dell but it is unclear whether they will fix it as Server 2022 is not officially supported on 13G servers (although it has given no issues up until now).

 

Given that Microsoft caused the issue with a poorly-tested update, I would hope that they would issue a patch to fix it - but given that they don't appear to be acknowledging that the issue even exists on bare-metal installs that's not currently looking hopeful.

 

Have others experienced this issue? Might be useful to post the make/model of server here if so. Perhaps a thread with a list of affected machines might bring this to Microsoft's attention! The comments in the article above indicate that other brands are affected as well as Dell.

23 Replies
Well, if the issue comes from a driver/firmware bug, Microsoft can't do much to begin with - it's up to the OEM to fix it.
And if you install Windows Server 2022 on unsupported hardware, it's likely the OEM won't help you about it - that's why you should avoid unsupported configurations at all costs : if something goes wrong, you're on your own.
Not sure why you're blaming Microsoft about this - it's not like Microsoft forced you to install Windows Server 2022 on those servers...

The point is that Microsoft broke it with the last Patch Tuesday update - it worked fine up until that point. It's quite normal to run server OSs on older hardware on which it's technically not supported, and it rarely gives any issues at all. And indeed the same applied with client versions up until W11.

 

And just to note - the servers I have tested it on only don't support it in the sense that Dell don't support running Server 2022 on them - they do meet the Microsoft requirements for Server 2022 as set out here: https://learn.microsoft.com/en-us/windows-server/get-started/hardware-requirements.

It broke because there is something wrong with those specific drivers/firmware, not because of Microsoft (otherwise, Microsoft would have provided a fix).
And no it's not normal to run OS on unsupported hardware in a production environment. It doesn't matter if there are not many issues - one is enough. And if you cannot rely on OEM support to fix it, you're toasted.
It's called risk management. Gambling is not a way to manage a production environment.

Maybe Dell will be kind enough to update those drivers/firmware for free. If not, you can replace Windows Server 2022 by an older, supported OS on this hardware, or you can upgrade your hardware, one supported by Dell for WS 2022.


The servers concerned are used in secondary roles - in one case for testing - so they need to be on the same version as the production systems.

As to whether there's something wrong with the specific drivers / firmware - so far as I'm aware none of the parties involved have released any details of what the cause is. Do you have inside information or are you just speculating? Because without details it's not possible to say what the cause was, and the fact that VMWare have issued a patch only tells us that they (quite sensibly) wanted to get it resolved as quickly as possible - which may be fixing an issue in their product, or may simply be a workaround to whatever Microsoft has done - without further information it's impossible to tell. The fact that VMware, Dell, Lenovo and HPE all seem to be affected is notable, though.
I do not have any insider information. You'll see what Dell support team can tell you about this issue.

@Alban1998 It is absolutely normal to run more recent OSes on older hardware in a combination that is technically not supported by the hardware vendor. The only parties arguing against this are those who financially benefit from customers constantly having to upgrade their hardware. Many companies cannot afford to spend tens of thousands of dollars on brand new hardware because a consultant or vendor tells them that its too risky not to. Most companies can't afford to drink that Kool-Aid. There are effective ways to mitigate the risk associated with unsupported hardware/software combinations (failover clusters come to mind), but these mitigations don't enrich Microsoft and its hardware partners the same way trying to force people to constantly upgrade their hardware does.

 

"otherwise, Microsoft would have provided a fix". This implies Microsoft actually tests their updates or cares about their customers. They care about making money and will fix issues that cost them money. This is a pretty recent issue, so it is yet to be determined if Microsoft will fix it or not. I'm still hopeful they will, but I don't think it will be out of the kindness of their heart.

 

We experienced this issue on a Dell PowerEdge R730XD and a R430. In the case of the R730XD, this was a clean install of Windows Server 2022. After several clean installs, we were able to narrow down the issue to this particular update. We went so far as to back up the Secure Boot database, perform the update that breaks secure boot, and restore the database to what it was before that and Secure Boot still didn't work. The only thing that changed was Microsoft's update. They can blame whoever they want for this issue, but it happened as a result of their update.

Thankfully, virtualization based security (including HVCI and Credential Guard) work just fine without Secure Boot. This is despite Microsoft's claim that these feature require Secure Boot to function.

Thanks for that!

Latest I've had from Dell is that it was passed up to the chain, has been looked and and they have reproduced the issue, and confirmed that servers later than 13G are not affected. Because Server 2022 isn't officially supported on 13G servers, they are not currently committing to doing anything about it, but that may change depending how many people are affected (could be next Patch Tuesday before a lot of the machines are rebooted again, and the problem shows itself). It was sugggested that it was very much dependent on how much they get contacted, and I was advised to raise it with our account manager, which I will be doing as I have a call with them anyway tomorrow. If you've got affected devices under warranty and have not yet put in a support case, sounds like it's definitely worth doing to make them aware that multiple people have the issue.

Dell are also hopeful that Microsoft will issue a patch, but didn't have any information on whether this was likely when I spoke to them.

As regards your point about stretching hardware lifecycles - yes, quite! I've just retired an R710 which was nearly 12 years old and was still working OK (running Hyper-V Server 2016 as a host for some undemanding test machines).
We are getting off topic here, this could be a topic on its own.
You missed the "production environment" thing in my reply, this is a critical item when evaluating risk.
Well if your customers are fine with losing money because of unsupported stuff, I guess that's OK. Defeats the purpose of minimizing costs in the first place tough. Which itself contradicts buying brand new WS2022 licenses and CAL, when you can continue using WS2019, and using physical hardware for servers when you can use VM instead, and so on.
Or if you are really unable to cope with regular on-premise hardware upgrade, go Azure/AWS/Google.

And the next post kinda prove my point :
"they have reproduced the issue, and confirmed that servers later than 13G are not affected. Because Server 2022 isn't officially supported on 13G servers, they are not currently committing to doing anything about it". In some companies, telling this to your boss/customer triggers a Resume-Generating Event (RGE).

As for disabling Secure Boot to bypass your issue, and claiming VBS/CG are fine without it...yeah, good luck with that.
Must admit I am at rather a loss to know why you are bothering to comment here at all - you clearly have no first-hand knowledge of the issue and it seems just want to share the benefit of your wisdom and tell us how we should set up our systems.

As I've already said, we have good reasons for setting things up as we have done (we have a mix of physical and virtual servers, so already have the 2022 CALs, and Windows Server licenses don't cost a lot - we are a charity so get significantly reduced pricing on them). You seem to be making lots of assumptions about how we should do things despite knowing absolutely nothing about our use cases!

@Alban1998 I do agree this is getting a little off topic. I also apologize if anything I say comes off as personally disrespectful to you. My intention is not to criticize you in a way that is unfair or uninvited.

 

"Well if your customers are fine with losing money because of unsupported stuff, I guess that's OK." My "customer" is the company I work for, which experienced zero downtime as a result of this issue. No downtime means no money lost. If someone is losing money over unsupported hardware, that is a failure of the people and processes that set it up, not the hardware/software combination itself.

 

Buying used servers costs about a tenth of what buying new servers costs, who says anyone is losing money? You know who would love for people to believe this is true? Microsoft and its hardware partners. You know who it doesn't benefit? The companies whose IT staff are naïve enough to believe this and end up paying for new servers they should have never bought in the first place. Needlessly wasting your company's money to replace servers because Microsoft says you should, now that sounds like a "resume generating event"

 

"Defeats the purpose of minimizing costs in the first place tough. Which itself contradicts buying brand new WS2022 licenses and CAL, when you can continue using WS2019, and using physical hardware for servers when you can use VM instead, and so on." We have SA on all of our Microsoft licensing so we don't pay for upgrades. No additional costs related to CALs here. All of these servers are HyperV hosts. It's unclear to me why you thought otherwise. Needlessly wasting your company's money to buy license upgrades that should have been covered under SA, now that sounds like a "resume generating event".

 

"And the next post kinda prove [sic] my point": The only thing this proves is that Dell is hearing from their customers and is discussing if or how they will resolve the issue. This doesn't imply its Dell's issue to solve. Not understanding the typical behavior of (what is likely) one of your largest vendors and making bad assumptions as a result of that naïveté, now that sounds like a "resume generating event".

 

"Or if you are really unable to cope with regular on-premise hardware upgrade, go Azure/AWS/Google." Hosting the same workloads we host on-premise today in Azure would be far more expensive than hosting them the way we do. It would also be more expensive than buying new hardware. The only time it might make financial sense to host in Azure is if you have a highly variable workload that benefits from the scalability of Azure. Many companies are moving from Azure/AWS/Google back in to their own data centers specifically because the promised cost savings simply don't exist. Do you think a company like Microsoft would be pushing so hard to get people in to Azure if it wasn't financially beneficial to them? Needlessly wasting your company's money putting workloads in expensive public clouds, now that sounds like a "resume generating event".

 

"As for disabling Secure Boot to bypass your issue, and claiming VBS/CG are fine without it...yeah, good luck with that." This is working without issue on every server this has effected. Luck is not needed here.

 

The servers I manage have had nearly zero unscheduled downtime over the past decade. Because of the way we manage hardware lifecycles ("unsupported stuff"), I have saved my company tens (if not hundreds) of thousands of dollars over that time period.

 

I wasn't aware that "Resume Generating Event" was an acronym before you brought it up in your post. You know why that might be? Because its not something I've ever had to worry about.

 

There's an important lesson here that you seem to not understand: Microsoft, Dell, and IT consultants are for profit entities that operate in their own financial best interest. Everything they do (including the guidance they provide) is in service of that goal.

 

Be careful not to let your arrogance prevent you from understanding the implications of this lesson. That could end up being a "Resume Generating Event" for you.

@DavidYorkshire Its good to know that Dell is aware of this issue and at least discussing possibly resolving it. I'll reach out to our Dell rep and make them aware of how important this is to us.

"I've just retired an R710 which was nearly 12 years old and was still working OK (running Hyper-V Server 2016 as a host for some undemanding test machines)" What an interesting coincidence! In November, we retired our fleet of R710s also running Hyper-V 2016 for about 10 years and replaced them with R730s. The servers were working great even when we replaced them. We ended up replacing them because of the licensing benefits of running our cluster in a more dense configuration. We also continue to run a Dell PowerVault MD3200 in production whose storage controllers were manufactured in 2002. The quality of enterprise class hardware really is astonishing.

@DavidYorkshireWe can blame Microsoft for a lot of things - they have a long, long history of botched updates. But in the specific case you describe, and only this one, they don't look like the culprit to me - you think otherwise. That's OK.
I hope Dell will provide you a fix for this. Maybe they'll get in touch with Microsoft, and those provide a hotfix in the end, proving me wrong. I'll gladly be ok with that.
I assume those servers were production servers - this one is my mistake.

@Alex RourkeLooks like "RSE" triggered some very strong feelings within you - that wasn't my intention.
Risk management helps everyone - IT staff, IT managers, C-Staff, the company itself. Matching editors/OEM specs is also a way to protect yourself (the IT guy), which was I wanted to say. When disaster strikes, people may look for scapegoats.

I will post a last reply because there isn't much purpose to continue in this thread - you may send me a PM if you wish.
You seem very serious about reducing costs for the company you work for, no matter what. You got two metrics : costs and unscheduled downtimes. IT security isn't one.
You seem convinced that because you have been doing this successfully for years, it's the right way to do it, and to continue to do it.
You seem to think OEM, editors and IT consultants have no ethics whatsoever, and are only here to rob you of your money.
Those are some very strong beliefs. You looks like an IT superhero (or, dare I say, an IT god) to me.

I'm not that strong. I'm just a simple IT consultant. I doubt. And thus, try to encourage my customers to minimize their technical debt, improve their IT security, reduce their TCO, stay close to preferred architecture, implement Best Practices, and so on. Recommending them to put their production workloads on supported stuff is part of that, even if it costs me money (that's often mean I can't sell them the latest hardware/software shiny, and end up telling them to keep their existing stuff (despite being a profit entity)).
And if everything fails in the end, they can rely on some kind of support - the very last seatbelt they can rely on. Even if I fail (and I always assume I will fail), they are not without help or solutions. Working with companies who lost everything after a disaster teach me that.

You, on the other hand, never fail nor doubt, which is why our experience differ so much.

@Alban1998 I read your comment on "RGE" as "sounds like whatever you're doing should get you fired". I apologize for my response if that wasn't your intention. I find it difficult to read your remark any other way. I also realize that it may have been rather hypocritical for me to call you arrogant when my reply itself may have come off as arrogant. I apologize for that as well.

 

I don't think that all IT consultants lack morals or ethics (I do think this of Microsoft as a whole), I've dealt with some excellent consultants and some terrible ones. I do think that they preach the gospel of the vendors they represent all too often and trusting their judgement has gotten our company in to trouble in the past. This is not a criticism of you personally. It would be unfair of me to pass judgement on you without knowing you.

 

The one interesting difference I'm noticing is that you appear to conflate security and reliability. I do agree that having a supported hardware/software combination is an important component when it comes to reliability. In the case of how we choose to operate, we attempt to mitigate that risk by carefully testing updates and by having a robust failover and disaster recovery strategy. I respect why you would tell a client to buy new hardware to mitigate this risk. However, I'm more dubious of the assumption that using a supported software/hardware combination is as paramount to security as it is to reliability. New hardware certainly introduces new security features that I may not have access to, but the savings associated with pursuing this route means we have capital to invest in other security software/projects/consultants. Dollar for dollar, I think we get more out of those investments than we do if we were to spend that money on new servers. I know this might not be the case for everyone.

 

I imagine it is tough being a consultant in this case. I can make that choice for my company because I understand the environment well enough to do it with confidence. As a consultant, you have to put a lot of faith in your clients and have to make choices based on the fact that they are probably only going to call when something goes wrong. I don't envy the position I imagine you often find yourself in when this happens.

 

I still disagree with your hard line on, "its not a supported configuration, therefore, its your fault." But because of this conversation I do at least respect your opinion. I'm ashamed to admit that I was so jaded by this situation that I did not before.

 

Have a wonderful day.

I had same problem. On My lab Dell R730 with esxi 7.0.3 I applied patch VMware-ESXi-7.0U3k-21313628-depot.zip and then I was able to boot win 2022 with KB5022842
Yeah, that works for VMWare - seems to affect all VMWare 7.x (and potentially below) on any hardware - I've got ESXi 7.0.3 running on a couple of R650s and the patch did fix the issue on them.

The issue with bare-metal installs of 2022 on 13G servers is not resolved, though.

We just installed a TPM on an r530 running win2022 and while we were at it we enabled secure boot.  But it won't boot throwing error uefi0073.  I assume this has to do with the feb CU which was also just installed.  What timing lol.  Will disable secure boot for now and follow for updates.

As always, I am a little late to the party, but here goes nothing. Running into the boot failure issue with Server 22 STD bare metal and Hyper-V installation boot failure when KB5022842 is present at the time Hyper-V is loaded or when the KB is loaded after Hyper-V is already present. Yes, I understand that Server OSs are not meant to run on certain hardware platforms. In my case the ASUS Prime B760M-A D4 (Bios 0807) and Prime B660M-A D4 (BIOS 2212). Both with (2) 2TB SATA III HDD in a RAID 1 using Intel VMD. These are for small businesses that want server class functionality on a budget. I have a long history of building "servers" with Hyper-V on non-server platforms without significant problems. I have done the dance with Intel and ASUS and they have both come back with, we don't test things that are not in the recommended build parameters. OK, I get that. I am not blaming MS for floating a bad update - yet. So, Alban1998, you can kindly leave your unhelpful comments at your keyboard. For the rest of us who live and work in the real world of servicing budget-minded clients, we need these non-standard builds, that have worked in the past, to continue to work going forward. That is why we have these discussions. When I started experiencing this issue, I thought it had something to do with the new Intel VMD RAID (Sub-set of Intel Virtual Raid on Chip - VRoC) for non-Xeon systems because I saw it first on the B760 chipset. Then, I saw it on the B660 chipset and have heard others comment about it on the 500 series chipset. Further testing and reading lead me to MS updates, specifically KB5022842. I still have a lingering suspicion that something in KB5022842 is glitchy with the IRSTe and VMD drivers as they relate to Hyper-V, but I cannot put my finger on anything. I suspect that because of where the boot failure happens. My system will POST, hand-off to the OS and then hang at the blue window splash screen like it is hanging on a driver. From the article I have read though, it seems that secure boot and this KB are the oil and water we are dealing with. I suspect that MS is waiting to see how much neg press is generated by this issue and the technical feed back from various event logs being posted before they do anything. This is going to affect a lot of small business operations that recently dropped thousands of $s to upgrade their 12R2 systems only to have their "server" crash due to an MS update. The March updates are out, So I will see if there is a fix in the mix.

@Kenji McXntosh 

 

Update

 

Patch Tuesday today. I checked the Microsoft report on the issue (M365 console, Health, Windows Release Health, Server 2022). It appears that Microsoft is still claiming that this issue only affects VMs on VMware 7.x and below, and that as VMWare have patched it Microsoft doesn't need to do anything:

 

"Resolution: This issue is resolved in VMware ESXi 7.0 U3k, released on February 21st 2023 [link]. No update from Microsoft is needed for this issue."

 

However, I decided to actually check in case they had quietly resolved the issue. I have two affected PowerEdge servers - a T430 and an R730. I patched them both with the latest Windows updates, rebooted, then rebooted again and went into the BIOS settings and tried turning Secure Boot back on.

 

And they work again! Boot fine, with Secure Boot turned on. So it would appear that Microsoft has done something with today's update to fix the problem.

@DavidYorkshire  I concur. My B660 system with Hyper-V already loaded that failed to boot when KB5022842 was applied, now boots fine with that KB rolled back and KB5023705 loaded. In addition, my B760 system that would not boot if I attempted to install Hyper-V because KB5022842 had already been applied, booted fine once KB5023705 was applied and then Hyper-V installed.
For those that want to really geek-out. I compiled a short list of what I think are the relevant changes between the Feb and Mar CU:
"stornvme.sys","10.0.20348.1547","03-Feb-2023","02:17","233,504"
"stornvme.sys","10.0.20348.1607","08-Mar-2023","23:58","230,768"
"VmComputeAgent.exe","10.0.20348.1366","03-Feb-2023","13:53","1,492,352"
"VmComputeAgent.exe","10.0.20348.1607","09-Mar-2023","07:23","1,492,352"
"vmcompute.exe","10.0.20348.1547","03-Feb-2023","07:41","4,026,384"
"vmcompute.exe","10.0.20348.1607","09-Mar-2023","05:03","4,023,664"
"vmchipset.dll","10.0.20348.1547","03-Feb-2023","07:41","1,025,360"
"vmchipset.dll","10.0.20348.1607","09-Mar-2023","05:03","1,025,408"
"hvservice.sys","10.0.20348.1311","03-Feb-2023","13:53","136,536"
"hvservice.sys","10.0.20348.1607","09-Mar-2023","05:03","136,576"
"hvloader.dll","10.0.20348.1311","03-Feb-2023","13:53","148,832"
"hvloader.dll","10.0.20348.1607","09-Mar-2023","05:03","148,864"
"hvax64.exe","10.0.20348.1547","03-Feb-2023","07:42","1,630,232"
"hvax64.exe","10.0.20348.1607","09-Mar-2023","05:03","1,627,472"
"hvix64.exe","10.0.20348.1547","03-Feb-2023","07:42","1,732,624"
"hvix64.exe","10.0.20348.1607","09-Mar-2023","05:03","1,729,864"
"kdhvcom.dll","10.0.20348.1311","03-Feb-2023","13:53","58,704"
"kdhvcom.dll","10.0.20348.1607","09-Mar-2023","05:03","58,704"
"vmms.exe","10.0.20348.1547","03-Feb-2023","07:41","14,792,064"
"vmms.exe","10.0.20348.1607","09-Mar-2023","05:03","14,787,920"
Hope this helps. :)