ORCHESTRATION GROUPS - are there plans to improve, as in "fix" these? We are trying to move away from WSUS for server patching. We've been using SCCM and ADRs for PC / laptop patching for years and it's great. Have moved to SCCM for servers recently, few teething problems with maintenance windows and patch installs not starting etc but we're dealing with them.
But clusters.......christ.
This weekend our DR hyper-v cluster orchestration kicked off at 12. Group is configured for 1 member at a time. 2 locks were assigned at the exact same time. Both failed, hung the entire orchestration group. Today i tried to reset the state. There's no "just stop orchestration for this group" option. Per-member reset is iffy. But it appeared to work. So then i right clicked the group, start orchestration, and ticked the "ignore maintance windows" box. First node sat "in progress" for ages but nothign actually happening. Logs indicated the task was not passing the maintenance window check, even though i told it to ignore maintenance windows.
Tried to reset state a few times again but all that happened was all 8 nodes were showing "waiting" and nothing I could do would stop it.
Eventually i just added a once off maintenance window for "now plus a few hours" and that seemed to kick it in, one node went to "in progress". as i write, though, still nothing actually happening.
You have no actual native way to patch clusters with SCCM. Cluster Aware Updating was fine with WSUS but we're trying to remove WSUS and don't want to have to run a WSUS server just for cluster patching when everything else is using SCCM. And we do NOT intend to allow our servers all pull from Windows Update directly so don't go there...
Orchestration groups kinda sorta work but we had to write scripts to pause the nodes first and drain them, which isn't as easy as it sounds as drain time can exceed the max wait time of the suspend-clusternode command, and using move-vm etc doesnt' work as the orchestration account doesn't have rights.
To make these things truly enterprise ready we'd need at minimum:
- Ability to say "stop" on a group as a whole basis
- Native cluster integration. Whether that's the orchestration feature itself recognising that it's a cluster and acting accordingly, or whether that's allowing hte existing cluster aware updating tool plug into SCCM / see the patches made available in software center i don't know but we need something.
- Better logging - tracing an update process not to mention an orchestration process through SCCMs 4 million log files is painful. One log that says "this is what i'm doing second by second" would be so much more useful. That could be as simple as a log file that pulls entries from the other log files to one place BUT there's a lot of the language in the log files that's clear as mud and doesn't actually tell you much.
- Better linkage between deploymetn and maintenance windows. We've found that orchestration just doesn't kick in unless the maintenance window is directly applied to the same collection that the update is actually deployed to. And we couldn't find that documented anywhere - copilot figured it out. So for example we have updates deployed via ADR to wide collections like "Test Group", "Group 1", "Group 2" covering servers across multiple teams. and then we deploy maintenance windows to different collections that are basically "system" server groups. But orchestration doesn't see it.
- Consistent behaviour from the "ignore maintenance windows" button. Here i am 20 minutes after writing the "i've just triggered it again" bit and it's still saying it doesn't pass the maintenance window check. I suspect that's because it's seeing that the server is ALSO in the "wide" collection to which the update is deployed via the ADR (with a 2nd deployment then specifically to a collection for this orchestration group) but that that collection has no maintenance window on it. So not enough that it must deploy to a collection with a maintenance window configured but it seems like maybe it must ONLY deploy to collections with maintenance windows and the local agent is not figuring out that "i am in an orchestration group, i have this patch deployed, i have a maintenance window".