Onboarding and servicing non-persistent VDI machines with Microsoft Defender ATP
Published May 05 2020 10:30 AM 52.1K Views
Microsoft

Virtual Desktop Infrastructure (VDI) is fairly common in customer environments, especially in today’s world where many are working from home as a result of COVID-19. As such, we want to ensure that Microsoft provides protection for VDI machines, and that you understand how  Microsoft Defender Advanced Threat Protection (Microsoft Defender ATP) works within your VDI deployment. In this blog post, we’ll cover VDI, how it works with Microsoft Defender ATP, best practices, and some lessons learned.

 

When we talk about VDI, we often talk about two different deployment types:  persistent and non-persistent. Let’s look at both of these types and explore how they interact with Microsoft Defender ATP onboarding.

 

Persistent VDI

 

Persistent VDI is a deployment type where the virtual machines (VM) persist their state, meaning that the machine doesn’t lose its state or data when it is rebooted, shutdown, or when a user logs off. In short, the persistent VDI machine behaves much like a physical machine in that local data is saved or persisted across these actions (reboot, shutdown, logoff). Onboarding a persistent VDI machine into  Microsoft Defender ATP is handled the same way you would onboard a physical machine, such as a desktop or laptop. Group policy, Microsoft Endpoint Manager, and other methods can be used to onboard a persistent machine. In the  Microsoft Defender Security Center, (https://securitycenter.windows.com) under onboarding, you would select your preferred onboarding method, and follow the instructions for that type. As you can see, onboarding persistent VDI machines really isn’t different than onboarding a physical machine or a server that is a virtual machine. For the remainder of the post we will focus on non-persistent VDI.

 

Non-persistent VDI

 

Non-persistent VDI is the opposite of persistent VDI. In non-persistent VDI, the virtual machine state does NOT persist across actions, such as reboot, logoff, or shutdown. Typically, when one of these actions is performed, the virtual machine is deleted. It’s helpful to understand at a high level how the non-persistent VDI model works. This will help paint a better picture of how Microsoft Defender ATP onboarding fits in.

 

In a non-persistent deployment type, VDI pools are typically deployed off one virtual machine commonly referred to as the VDI master, golden image, or master image. For simplicity, I’ll refer to it as the VDI master here. At an extremely high level it looks something like this:

 

JesseEsquivel_0-1588695840332.jpeg

 

The VDI master is a virtual machine that has Windows installed, as well as software, and any customizations. No users will ever actually log on to this machine; instead, it is used by the virtualization platform as a template to deploy VDI machines. The deployment of the VDI machines from the VDI master typically happens by building the master, installing any software and customizations, then shutting it down in order to provision multiple VDI machines into groupings called “pools.” The VDI machines that are provisioned into pools are what end users log on to.

 

As you can see, we have a unique scenario with non-persistent VDI. Changes are only made to the VDI master, and the VDI machines that users log on to are based off of this VDI master. But the VDI machines do not save their state and are essentially deleted after a user is done with that machine.  This means at any given time two things can be happening:

 

  • VDI machines are being spun up in the pool
  • VDI machines are being deleted or de-provisioned

For specifics on this, you will want to contact your virtualization platform vendor.

To protect your VDI machines, you should onboard them into Microsoft Defender ATP. But this puts you in a bit of a chicken/egg scenario. If you somehow deploy the onboarding mechanism to the VDI machines after they are provisioned/spun up, you might have some time lapse between the time the machine came online, and when it is onboarded into  Microsoft Defender ATP. This leaves a potential gap in time where the VDI machines would not be protected by Microsoft Defender ATP.

 

What about using the startup script via group policy? Although you can use this method, there is potential for things to go wrong here as well, such as if group policy is broken for some reason, or you run into domain controller/sysvol issues, etc. Configuration management tools such as Microsoft Endpoint Manager are typically not used on non-persistent VDI machines, mostly because anything that is installed or done to the machine with a configuration management solution is lost when the machine is deleted/de-provisioned (so all of the actions there are in vain).

 

What you need is a way to onboard the VDI machines at first boot as soon as they are created. This is why you have the “VDI onboarding for non-persistent machines” option in the Microsoft Defender Security Center, as shown in the following image:

 

JesseEsquivel_1-1588695840349.png

 

Since you want the VDI machines (which are child clones of the VDI master) to be onboarded immediately at first boot, you must stage the onboarding script on the VDI master. That way, it is executed as a startup script at first boot on all of the VDI machines that are provisioned from the VDI master.

 

Note: This is important to understand: The placement and configuration of the VDI onboarding startup script on the VDI master is merely staging the file and configuring it as a startup script to be executed on the VDI machines. It is NOT intended to onboard the actual VDI master; in fact, you should not onboard a VDI master (more on this later).

 

The documentation shows that there are two different ways to configure the startup script for VDI machine onboarding:

 

  • A single entry for each machine
  • Multiple entries for each machine

This is explained in detail in the documentation, but essentially boils down to how many objects for a given VDI machine you want to see in the Microsoft Defender Security Center.

 

Let’s take a look at what is happening behind the scenes. The difference is specifically in the single entry for each machine configuration. With this method, we call the .ps1 directly as a startup script as mentioned in the documentation. Here is why: if you crack open the .ps1 onboarding script, you can see the following:

 

onboarding-servicing-non-persistent-vdi-machines-2.png

 

The .ps1 is generating a unique value called senseGuid, which is based off of a concatenation of the OrgID (pulled from the WindowsDefenderATPOnboardingScript.cmd file), the “_” character, and the computer name value. If not already present, the senseGuid value is written to the registry on the VDI machine at the path noted in the $senseGuidRegPath line. Later in the .ps1, the WindowsDefenderATPOnboardingScript.cmd file is then called. When the machine starts the SENSE service, the machineID is calculated, and its conversation with the tenant for onboarding begins. The senseGuid value produced by the .ps1 in the registry on the machine ensures the same machineID is used as long as the machine DNS name stays the same. The machineID value is also calculated as part of the onboarding process, and once it is calculated it is also written to the same path in the registry on the local machine as a value called senseID.

 

There you have it; this is the logic that ensures the machine only has a single entry in the portal each time onboarding is run. If you opt for the other route, the .ps1 is not used and the .cmd is called directly as a startup script, which doesn’t have this logic; therefore, multiple entries for the same machine are populated in the portal.

 

There is more to this VDI onboarding story, such as how all of this relates to the management or servicing of the VDI Master.

 

VDI Master Servicing

 

Patching and servicing of VDI machines in a pool doesn’t happen directly on those machines. If you push software or configuration changes to them, those changes are lost at logoff/reboot/shutdown of the VDI machine. The patching/servicing happens only on the VDI master. Typically, organizations re-compose their VDI pools at least once a month at a minimum to incorporate the latest Microsoft updates. From a high level, there are several approaches to servicing the VDI master. They are as follows:

 

  1. Reuse the existing master by powering it on, patching and updating it, then shutting it back down.
  2. Build a new master from scratch via automated process such as Microsoft Deployment Toolkit.
  3. Offline servicing (if possible).

Option 1 – Re-use the existing master

 

Reusing the existing master can have unintended consequences if you are not careful. The main reason is that because you have staged the onboarding script on the VDI master (so that it is executed as a startup script at first boot on all of the VDI machines), this means that when you power on the VDI master, the onboarding script is going to run. Remember that you should never onboard the VDI master.

 

If you power on the VDI master, it will be onboarded (which you don’t want). The problem arises if you don’t offboard the VDI master, do some cleanup, and apply patches/service updates, shut it down, and then deploy a VDI pool from it. Here is the scenario and outcome:

 

  1. Suppose that you have onboarded the VDI master simply by powering it on and it is assigned a senseGuid and a senseID. Those are written to the registry (with all the other onboarding info).
  2. The VDI master is patched/serviced and shut down, and a new VDI pool is deployed from it.
  3. Once the VDI pool is deployed, the VDI machines start to boot up.
  4. The VDI machines run the onboarding (because they are configured to do so), but onboarding exits because they are already onboarded.
  5. Since the VDI machines are basically a clone (I use this term loosely here) of the VDI master, they already have the senseID and senseGuid written to their registry along with all the other onboarding information.
  6. When all the VDI machines start to report their telemetry, they are reporting it under the same senseID (or machineID as it is called in the Microsoft Defender Security Center).

Having all your VDI machines reporting telemetry to the same senseID/machineID in Microsoft Defender ATP causes a number of issues. There is no delineation between VDI machines, so if something happens on one of them, you won’t actually know which one it happened on, since they all report their telemetry to the same machineID. This is a bad scenario any way you look at it.

How do you avoid this scenario? (and you should at all costs) If your organization uses the method of turning on and servicing the same VDI master, then you should ensure that you perform a couple of extra steps to avoid this scenario.

 

Note: Each time you boot the VDI master for servicing/patching, make sure to run the offboarding script (downloadable from the Microsoft Defender Security Center). This will turn off the Microsoft Defender ATP sensor and remove the onboarding information from the registry.  You also need to make sure that the cyber folder contents are cleaned out, as data will begin to accumulate there when it is onboarded. Only the system account has access to perform this action. You can use psexec to open a cmd prompt as system:

 

PsExec.exe -s cmd.exe

cd "C:\ProgramData\Microsoft\Windows Defender Advanced Threat Protection\Cyber"

del *.* /f /s /q

exit

REG DELETE "HKLM\SOFTWARE\Microsoft\Windows Advanced Threat Protection" /v senseGuid /f

 

Our documentation has been updated here to reflect this. This brings us to option number two, which is to build your VDI master from scratch via automated process each time you recompose your VDI pools.

 

Option 2 – Build a new master from scratch

 

There are a number of good reasons why you should consider using this method. The first reason is, you will never be in a position to have the issue described above happen (since you don’t reuse the same master over and over). You also get other benefits:

 

  1. Using a clean/fresh install of Windows each time (no WinSxS or image cleanup via DISM required)
  2. VDI master build automation
  3. VDI master software package consistency
  4. Automated integration of Microsoft Defender Application Control policies
  5. Automated integration of Microsoft Defender ATP onboarding
  6. Automated development and testing of your organization’s vNext VDI image
  7. More agile iterations between Windows 10 versions and software
  8. ...and more.

Using this approach also removes a significant amount of room for human error by using automation tools such as the Microsoft Deployment Toolkit (MDT) to build your VDI master. We have seen great success by customers who use this method to deploy their VDI pools. A sample script that can be used to stage the Microsoft Defender ATP onboarding script on your VDI master during an MDT task sequence is here.

 

Option 3 – Offline Servicing

 

This option is really only available if you’re using a Microsoft formatted virtual hard disk, such as a vhd or vhdx. DISM can be used to service these disks offline, and this has also been added to the documentation.

 

I hope this helps better explain Microsoft Defender ATP onboarding and servicing for non-persistent VDI machines. Let us know what you think by leaving a comment below. And stay tuned--we will talk about Microsoft Defender Antivirus settings in a non-persistent VDI environment next time!

 

Jesse Esquivel, Program Manager

Microsoft Defender ATP

19 Comments
Copper Contributor

Great job, @JesseEsquivel, thanks for sharing.

Have you ever seen any issues with onboarded machines going offline/closing ports after being onboarded?

We are facing such issues in a customer scenario where the VDI machines become unavailable to Citrix after being onboarded. The required ports simply close and the machines cannot be used. Once we remove the onboarding script, everything works again.

We have opened a support ticket with Microsoft but Support seems unable to help or even understand the issue.

Microsoft

Hi @fgondorf I have not seen this issue.  Onboarding a machine into MDATP simply instructs it to start sending telemetry to an MDATP tenant.  I would take a look at any host based firewalls on the machine, and I would recommend continuing to work with support as well.  

Copper Contributor

Hello @fgondorf 

I am experiencing this same issue in my environment, exactly the same behaviour as you described.

Please could you detail the steps the did to resolve the problem?

Thanks in advance.

 

Copper Contributor

@JesseEsquivel , this is awesome, thank you!  I have been struggling to find a way to onboard my Citrix VMs and this seems to be the way.  I've had to hack the script around a little because we're not running Server 2019 but I was able to use the logic to get the same outcome. 

 

Hopefully I don't run into the same issues as the others did, but at least I'm on the right track. 

Copper Contributor

Thanks @JesseEsquivel .

 

Will start testing with these 2 scenario's (single entry / mulitple entry) in the lab for Server 2019.

We had done some testing before with some non-persistent Server 2016 machines (Citrix MCS machines for session virtualization) just by onboarding the master image. When Citrix MCS created different hosts based from that master they just got onboarded fine and each host was listed with the individual DNS names in the portal (without registering the VDI tag as mentioned in the MS doc). 
So I am a bit puzzled why we didn't see new (multiple) entries after rebooting these machines (since they are non persistent).
  

Copper Contributor

Wonder article, your really rocked it big time. once we get it in ATP console and follow all this for. Non persistent vdi, Can we just use GPO for scans exclusions and things like that for the normal operations. Do you guys have any windows Defender tuning around non persistent vdi like perferred scan times, scan files on opening or anything around defining running windows defender in a non President vdi setup?

Microsoft

 

 

 Great job, @JesseEsquiveland thanks for the article I noticed that you reference "Our documentation has been updated here to reflect this." That Link contains an extra step that is missing in your article:

REG DELETE "HKLM\SOFTWARE\Microsoft\Windows Advanced Threat Protection" /v senseGuid /f

 

We found that if the senseGuid is not removed it never calls the WindowsDefenderATPOnboardingScript.cmd script and the VDI Server are not onboarded

 

Thanks

Copper Contributor

Hi @JesseEsquivel 

Nice article, really useful!

We are facing an issue where the onboarding script doesn't run on VDI machine startup. We tried putting it locally and on domain. The error we get is "Event 1130 - The system cannot open the file"

 

Powershell Execution policy is set to Unrestricted, but I see that if I run the file manually, it doesn't run unless we launch Powershell in admin mode (which I think is expected). Any thoughts?

 

 

 

Brass Contributor

Hi @JesseEsquivel, Microsoft Defender for Cloud automatically onboards VMs to Defender for Endpoint by using the MDE.Windows extension. If you enable Microsoft Defender for Cloud for a subscription that contains AVD related hosts (Master VM and session hosts, for example) and it uses the 'automagic' onboarding (MDE.Windows extension), this will interfere with the script described above. It will both onboard the Master VM, which is not recommended and it will (try to) onboard the session hosts (based on the image created from the Master VM). We are seeing all kinds of trouble now: multiple entries of the same VM etc. What is Microsoft's recommendation for this scenario? Is it even recommended to enable Microsoft Defender for Cloud for AVD subscriptions and have an automatic onboarding which does not seem to work with AVD scenario's or should we avoid this?   

Microsoft

@Gertjan Jongeneel It seems in this scenario you will want to methodically control the onboarding for the AVD session hosts, so you probably don't want to enable Defender for Cloud for those.

Copper Contributor

@JesseEsquivel Thanks for sharing the MD ALP knowledge !

          I used Microsoft Defender ATP onboarding for non-persistent VDI machines .   I cloned two non persistent VDI machines from the master image.  I checked registry It shows that child VM1 and child VM 2. They used the different senseid but used the same senseGuid . 

         In this page you motioned sesseid called in the Microsoft Defender Security Center actually refers to machineID . So does that mean different machines can be distinguished as long as the senseid are different ? How to understand senseGuid ? What will happen if the same senseGuid is used by multi child clone machines?

Brass Contributor

Speaking about Option 1, it's not that practical to have to download each time the offboarding script, since it's valid for one month only. Is there no way to make the script permanent, in order to put in place an automatic procedure to offboard the master image before sealing it?

 

Microsoft

@MaxMorsia no, the offboarding script will always have an expiration.

Iron Contributor

awesome information 

Brass Contributor

One doubt. We have this environment where the gold (master) is used to generate one worker (the non-persistent VDI), which retains the same device name at each iteration. Now, from what I understand being a non-persistent machine in any case, the correct procedure should be onboarding by automatically starting the Onboard-NonPersistentMachine.ps1 in local group policy, which in turn calls for the WindowsDefenderATPOnboardingScript.cmd script (of course we need to take care about offboarding the master as described in the article every time we make changes on it). I'm not sure anyway if the constant worker's device name can produce annoyances. Any idea?

Brass Contributor

Couples questions and comments regarding this process.

  1. The offboarding script no longer works as Defender itself detects the psexec command as a virus threat when trying to stop the sense service. If the service cannot be stopped, the reg value senseGuid and the files from "C:\ProgramData\Microsoft\Windows Defender Advanced Threat Protection\Cyber" can't be deleted either.

  2. Please provide an improved PowerShell script to do the offboarding that actually works and has been tested. Doing QA on a recommended script to guarantee that the actual Antivirus is working is kind of important. This needs to work 100%.

  3. If indeed we need to download a fresh copy of the WindowsDefenderATPOnboardingScript.zip file every month, please provide a static URL to download it so we can automate the task with our master image update. Something like http://aka.ms/WindowsDefenderATPOnboardingScript. At this moment, this is a complete nightmare to maintain.

Also posted here
Offboarding no longer works, psexec -s cmd.exe sc stop sense is detected as a threat by Defender · I...

Copper Contributor

@JesseEsquivel thank you for excellent explanation. I'm looking for a similar process for non-persistent Linux VMs and the documentation is silent about it. I have a Linux VM that I rebuild pretty regularly with the same hostname and everytime I do that a new, duplicate entry is created in defender portal, which I'm trying to avoid.

Iron Contributor

great

Copper Contributor

@JonathanPitre

 

Were you able to find a way to automate the download of the Offboarding script? I could not find anything to this Topic, and the Issue on GitHub does not exist any more.

Co-Authors
Version history
Last update:
‎Sep 13 2022 07:13 AM
Updated by: