Updated 3/23/2023 to focus on the shared security intelligence feature for VDI.
Virtual Desktop Infrastructure (VDI) brings an interesting dynamic when tuning the platform. The delicate balance...
Our environment is seeing still random incidents of large Horizon instant clone pools experiencing the high disk usage. This is even though we've completely removed the onboarding process on the gold image, to help isolate the issue further. Seems to enforce what you stated earlier that the issue isn't onboarding related.
We still have not built a Security Intelligence server but we are looking to today. I feel like this is also not our issue honestly though, as the problematic pool we saw yesterday has been spun up for a while, and also was republished Sunday afternoon. If a SIU was going to crush it, would we not see it sooner? We went through Sunday, Monday, and all day Tuesday until 2PM Eastern when we saw it.
Right now my suspicion is on the Defender GPO we have applied to where these machines live. The policy being used there was not designed specifically for VDI, it was actually a server GPO that our Defender admins seemed fine copying and just adding VDI specific exclusions for, but I'm thinking this was probably not the way to go. We're seeing things in there such as:
I really feel perhaps there is a scan/randomized task that is happening on our pools, that causes that issue. From 2PM until maybe 9PM yesterday we were seeing the issue, then without really any kind of change, it stopped. Implying that Defender runs normal, until something happens that starts killing disk, that eventually does end.
Sorry if all over the place, we were up pretty late. In a nutshell: we're going to review that GPO and see if we can test modding it on a dev OU/Horizon pool. I'm in favor or just removing Defender's GPO outright and having plain vanilla settings to test for a bit, perhaps with Horizon exclusions only, but that's where we're starting. We still haven't had any luck getting Microsoft support on a call, but hoping to push that some.
Thanks in advance.
Edit: wanted to add, as a troubleshooting step last night, we cloned the problematic image to a separate data center/ESXi host hardware and spun up 2 spare pools. The provisioning on them was BRUTAL, the exact same behavior. Once we hit like 800 VMs, the remaining 400 to provisioning were terrible. We looked at it, and the disk was high. We took the image again and this time disabled Defender outright on the image by disabling these services: Microsoft Defender Antivirus Service, Security Center, Windows Defender Advanced Threat Protection, and Windows Security Service. After disabling them, snapping, and republishing the pool, it went smooth as silk.