Windows Server

241 Topics

Enable Nested Virtualization on Windows Server 2025
Nested virtualization allows you to run Hyper-V inside a VM, opening up incredible flexibility for testing complex infrastructure setups, demos, or learning environments, all without extra hardware. First, ensure you’re running a Hyper-V host capable of nested virtualization and have the Windows Server 2025 VM on which you want to enable as a Hyper-V host ready. To get started, open a PowerShell window on your Hyper-V host and execute: Set-VMProcessor -VMName "<Your-VM-Name>" -ExposeVirtualizationExtensions $true Replace <Your-VM-Name> with the actual name of your VM. This command configures Hyper-V to allow nested virtualization on the target VM. Boot up the Windows Server 2025 VM that you want to configure as a Hyper-V host. In the VM, open Server Manager and attempt to install the Hyper-V role via Add Roles and Features. Most of the time, this should work right away. However in some cases you might hit an error stating: “Hyper-V cannot be installed because virtualization support is not enabled in the BIOS.” To resolve this error run an elevated PowerShell session inside the VM on which you want to enable Hyper-V and run the command: bcdedit /set hypervisorlaunchtype auto This command ensures the Hyper-V hypervisor starts up correctly the next time you boot. Restart your VM to apply the change. After the reboot, head back to Add Roles and Features and try installing Hyper-V again. This time, it should proceed smoothly without the BIOS virtualization error. Once Hyper-V is installed, perform a final reboot if prompted. Open Hyper-V Manager inside your VM and you’re now ready to run test VMs in your nested environment!
OrinThomas
Sep 15, 2025 Place ITOps Talk Blog
207Views
1like
0Comments
Shielded VM Template Creation in a Hyper-V Guarded Fabric
To set up a shielded virtual machine template on a Hyper-V guarded fabric, you need to prepare a secure environment (Host Guardian Service, guarded hosts) and then create a BitLocker-protected, signed template disk. This document assumes that all Windows Server instances used are running Windows Server 2022 or Windows Server 2025. Prerequisites and Environment Setup Host Guardian Service (HGS): Deploy an HGS cluster (typically 3 nodes for high availability) in a separate Active Directory forest dedicated to HGS. For production, HGS should run on physical (or highly secured) servers, ideally as a three-node cluster. Ensure the HGS servers have the Host Guardian Service role installed and are up to date with software updates. Attestation Mode: TPM-Based: Ensure that HGS is configured for TPM-trusted attestation. In TPM mode, HGS uses each host’s TPM 2.0 identity (EKpub) and measured boot sequence to verify the host’s health and authenticity. This requires capturing each Hyper-V host’s TPM identifier and establishing a security baseline: TPM 2.0 and Boot Measurements: On each Hyper-V host, retrieve the TPM’s public endorsement key (EKpub) and add it to the HGS trust store (e.g. using Get-PlatformIdentifier on the host and Add-HgsAttestationTpmHost on HGS). HGS will also require a TPM baseline (PCR measurements of the host’s firmware/boot components) and a Code Integrity (CI) policy defining allowed binaries. Generate these from a reference host and add them to HGS so that only hosts booting with the approved firmware and software can attest successfully. Host Requirements: Each guarded host (Hyper-V host) must meet hardware/OS requirements for TPM attestation. This includes TPM 2.0, UEFI 2.3.1+ firmware with Secure Boot enabled, and support for IOMMU/SLAT (for virtualization-based security). On each host, enable the Hyper-V role and install the Host Guardian Hyper-V Support feature (available in Datacenter edition). This feature enables virtualization-based protection of code integrity (ensuring the host hypervisor only runs trusted code), which is required for TPM attestation. (Test this configuration in a lab first as VBS/CI can affect some drivers). Guarded Fabric Configuration: Join Hyper-V hosts to the fabric domain and configure networking so that guarded hosts can reach the HGS servers (set up DNS or DNS forwarding between the fabric domain and HGS domain). After setting up HGS and adding host attestation data, configure each Hyper-V host as a guarded host by pointing it to the HGS cluster for attestation and key retrieval (using Set-HgsClientConfiguration to specify the HGS attestation and key protection URLs and any required certificates). Once a host attests successfully, it becomes an authorized guarded host able to run shielded VMs. HGS will release the necessary decryption keys only to those hosts that pass health attestation. Step 1: Create and Configure the Template VM (Gen 2) Prepare a Generation 2 VM: On a Hyper-V host (it can be a regular host or even a non-guarded host for template creation), create a new Generation 2 virtual machine. Generation 2 with UEFI is required for Secure Boot and virtual TPM support. Attach a blank virtual hard disk (VHDX) for the OS. Install Windows Server on this VM using standard installation media. Partition and File System Requirements: When installing the OS on the template VM, ensure the VHDX is initialized with a GUID Partition Table (GPT) and that the Windows setup creates the necessary partitions: there should be at least a small System/EFI boot partition (unencrypted) and the main OS partition (which will later be BitLocker-encrypted). The disk must be a basic disk (not dynamic within the guest OS) and formatted with NTFS to support BitLocker. Using the default Windows setup on a blank drive typically meets these requirements (the installer will create the EFI and OS partitions automatically on a GPT disk). Configure the OS: Boot the VM and perform any baseline configuration needed. Do not join this VM to any domain and avoid putting sensitive data on it as it should be a generic base image. Apply the latest Windows Updates and install any required drivers or software that should be part of the template OS (e.g. common management agents). Ensuring the template OS is fully updated is important for a reliable shielding process. Enable Remote Management: Because shielded VMs can only be managed remotely (no console access), consider configuring the template to enable Remote Desktop and/or PowerShell WinRM, and ensure the firewall is configured accordingly. You may also install roles/features that many VMs will need. However, do not configure a static IP or unique machine-specific settings in this template as those will be supplied via an answer file during provisioning. Step 2: Generalize the VM with Sysprep Run Sysprep: In the VM, open an elevated Command Prompt and run: C:\Windows\System32\Sysprep\Sysprep.exe /oobe /generalize /shutdown Choose “Enter System Out-of-Box Experience (OOBE)”, check “Generalize”, and set Shutdown option to “Shutdown” if using the GUI. This strips out machine-specific details and prepares the OS for first-boot specialization. The VM will shut down upon completion. Do Not Boot After Sysprep: Leave the VM off after it shuts down. The OS on the VHDX is now in a generalized state. Do not boot this VM again (doing so will boot into OOBE and break its generalized state). At this point you have a prepared OS disk (the VHDX) ready for sealing. (Optional) Backup the VHDX: It’s a good idea to make a copy of the sysprep’ed VHDX at this stage. After the next step (sealing the template), the disk will be BitLocker-encrypted and cannot be easily modified. Keeping an unencrypted copy allows you to easily update the template image in the future if needed. Step 3: Protect and Seal the Template Disk (Shielded Template Wizard) Next, seal the template VM’s OS disk using the Shielded VM Template Disk Creation process. This will encrypt the disk (preparing it for BitLocker) and produce a signed catalog so that the disk’s integrity can be verified later. Install Shielded VM Tools: On a machine with GUI (this can be a management server or even Windows 11 with RSAT), install the Shielded VM Tools component. On Windows Server, use PowerShell: Install-WindowsFeature RSAT-Shielded-VM-Tools -IncludeAllSubFeature (and reboot if prompted). This provides the Template Disk Wizard (TemplateDiskWizard.exe) and PowerShell cmdlets like Protect-TemplateDisk. Obtain a Signing Certificate: Acquire a certificate to sign the template disk’s Volume Signature Catalog (VSC). For production, use a certificate issued by a trusted CA that both the fabric administrators and tenants trust (e.g. an internal PKI or a certificate from a mutually trusted authority). The certificate’s public key will be referenced later by tenants to trust this template. (For a lab or demo, you can use a self-signed cert, but this is not recommended for production.) Import the certificate into the local machine’s certificate store if it’s not already present. Launch the Template Disk Wizard: Open Template Disk Wizard (found in Administrative Tools after installing RSAT, or run TemplateDiskWizard.exe). This wizard will guide you through protecting the VHDX: Certificate: Select the signing certificate obtained in the previous step. This certificate will be used to sign the template’s catalog. Virtual Disk: Browse to and select the generalized VHDX from Step 2 (the sysprep’ed OS disk). Signature Catalog Info: Provide a friendly name and version for this template disk (e.g. Name: “WS2025-ShieldedTemplate”, Version: 1.0.0.0). These labels help identify the disk and version to tenants. Proceed to the final page and Generate. The wizard will now: o Enable BitLocker on the OS volume of the VHDX and store the BitLocker metadata on the disk (but it does not encrypt the volume yet as encryption will finalize when a VM instance is provisioned with this disk). o Compute a cryptographic hash of the disk and create a Volume Signature Catalog (VSC) entry (which is stored in the disk’s metadata) signed with your certificate. This ensures the disk’s integrity can be verified; only disks matching this signed hash will be recognized as this template. Wait for the wizard to finish (it may take some time to initialize BitLocker and sign the catalog, depending on disk size). Click Close when done. The VHDX is now a sealed template disk. It’s marked internally as a shielded template and cannot be used to boot a normal VM without going through the shielded provisioning process (attempting to boot it in an unshielded way will likely cause a blue screen). The disk’s OS volume is still largely unencrypted at rest (encryption will complete when a VM is created), but it’s protected by BitLocker keys that will be released only to an authorized host via HGS. Extract the VSC File (for Tenant Use): It’s recommended to extract the template’s Volume Signature Catalog to a separate file. This .vsc file contains the disk’s identity (hash, name, version) and the signing certificate info. Tenants will use it to authorize this template in their shielding data. Use PowerShell on the RSAT machine: Save-VolumeSignatureCatalog -TemplateDiskPath "C:\path\WS2022-ShieldedTemplate.vhdx" -VolumeSignatureCatalogPath "C:\path\WS2022-ShieldedTemplate.vsc" This saves the .vsc file separately. Share this .vsc with the VM owners (tenants) or have it available for the shielding data file creation in the next step. Alternatively to the wizard, you can use PowerShell: after installing RSAT, run Protect-TemplateDisk -Path <VHDX> -Certificate <Cert> -TemplateName "<Name>" -Version <X.Y.Z.W> to seal the disk in one step. The wizard and PowerShell achieve the same result. Step 4: Create the Shielding Data File (PDK) A shielding data file (with extension .pdk) contains the sensitive configuration and keys required to deploy a shielded VM from the template. This includes the local administrator password, domain join credentials, RDP certificate, and the list of guardians (trust authorities) and template disk signatures the VM is allowed to use. For security, the shielding data is created by the tenant or VM owner on a secure machine outside the fabric, and is encrypted so that fabric admins cannot read the contents. Prerequisites for Shielding Data: Obtain the Volume Signature Catalog (.vsc) file for the template disk (from Step 3) to authorize that template. If the VM should use a trusted RDP certificate (to avoid man-in-the-middle when connecting via RDP), obtain a certificate (e.g. a wildcard certificate from the tenant’s CA) to include. This is optional; if the VM will join a domain and get a computer certificate or if you’re just testing, you may skip a custom RDP certificate. Prepare an unattend answer file or have the information needed to create one (admin password, timezone, product key, etc.). Use the PowerShell function New-ShieldingDataAnswerFile to generate a proper unattend XML for shielded VMs. The unattend will configure the VM’s OS on first boot (e.g. set the Administrator password, optionally join a domain, install roles, enable RDP, etc.). Ensure the unattend enables remote management (e.g. turn on RDP and firewall rules, or enable WinRM) because console access is not available for shielded VMs. Also, do not hardcode any per-VM values in the unattend that should differ for each instance; use placeholders or plan to supply those at deployment time. Creating the .PDK file: On a secure workstation (not on a guarded host) with RSAT Shielded VM Tools installed, launch the Shielding Data File Wizard (ShieldingDataFileWizard.exe). This tool will collect the needed info and produce an encrypted PDK file. Owner and Guardian Keys: First, set up the guardians. “Guardians” are certificates that represent who owns the VM and which fabrics (HGS instances) are authorized to run it. Typically: The Owner Guardian is a key pair that the tenant/VM owner possesses (the private key stays with the tenant). Create an Owner guardian (if not already) via the wizard’s Manage Local Guardians > Create option. This generates a key pair on your machine. Give it a name (e.g. “TenantOwner”). The Fabric Guardian(s) correspond to the HGS of the hosting fabric. Import the HGS guardian metadata file provided by the hoster (this is an XML with the HGS public key, exported via Export-HgsGuardian on the HGS server). In the wizard, use Manage Local Guardians > Import to add the hoster’s guardian(s) (for example, “Contoso HGS”). For production, you might import multiple datacenter guardians if the VM can run in multiple cloud regions, include each authorized fabric’s guardian. After adding, select all the guardian(s) that represent fabrics where this VM is allowed to run. Also select your Owner guardian (the wizard may list it separately). This establishes that the VM will be owned by your key and can only run on hosts approved by those fabric guardians. Template Disk (VSC) Authorization: The wizard will prompt to add Volume ID Qualifiers or trusted template disks. Click Add and import the .vsc file corresponding to the template disk prepared in Step 3. You can usually choose whether the shielding data trusts only that specific version of the template or future versions as well (Equal vs. GreaterOrEqual version matching). Select the appropriate option based on whether you want to allow updates to the template without regenerating the PDK. This step ensures the secrets in the PDK will only unlock when that specific signed template disk is used. Unattend and Certificates: Provide the answer file (Unattend.xml) for the VM’s specialization. If you created one with New-ShieldingDataAnswerFile, load it here. Otherwise, the wizard may have a simplified interface for common settings (depending on version, it may prompt for admin password, domain join info, etc.). Also, if using a custom RDP certificate, import it at this stage (so the VM will install that cert for remote desktop). Create the PDK: Specify an output file name for the shielding data (e.g., MyVMShieldingData.pdk) and finish the wizard. It will create the .pdk file, encrypting all the provided data. The Owner guardian’s private key is used to encrypt secrets, and the Fabric guardian’s public key ensures that HGS (holding the corresponding private key) is needed to unlock the file. The PDK is now ready to use for provisioning shielded VMs. (You can also create PDKs via PowerShell with New-ShieldingDataFile for automation.) Note the PDK is encrypted such that only the combination of the owner’s key and an authorized fabric’s HGS can decrypt it. Fabric admins cannot read sensitive contents of the PDK, and an unauthorized or untrusted host cannot launch a VM using it. Keep the PDK file safe, as it contains the keys that will configure your VM. Step 5: (Optional) Prepare a Shielding Helper VHDX In some scenarios, especially if you need to convert an existing VM into a shielded VM or if you are not using SCVMM for provisioning, a Shielding Helper disk is used. The Shielding Helper is a special VHDX containing a minimal OS that helps encrypt the template disk and inject the unattend inside a VM without exposing secrets to the host. SCVMM can automate this, but if you need to do it manually or for existing VMs, prepare the helper disk as follows: Create a Helper VM: On a Hyper-V host (not necessarily guarded), create a Gen 2 VM with a new blank VHDX (do not reuse the template disk to avoid duplicate disk IDs). Install a supported OS (Windows Server 2016 or higher, a Server Core installation is sufficient) on this VM. This VM will be temporary and its VHD will become the helper disk. Ensure you can log into it (set a password, etc.), then shut it down. Initialize the Helper Disk: On a Hyper-V host with RSAT Shielded VM Tools, run the PowerShell cmdlet: Initialize-VMShieldingHelperVHD -Path "C:\VMs\ShieldingHelper.vhdx" This command should point to the VHDX of the helper VM. This injects the necessary provisioning agent and settings into the VHDX to make it a shielding helper disk. The VHDX is modified in-place (consider making a backup beforehand). Do Not Boot the Helper VM Again: After initialization, do not start the helper VM from Step 1. The VHDX is now a specialized helper disk. You can discard the VM’s configuration. Only the VHDX file is needed going forward. Reuse for Conversions / Non-VMM Deployments: For manually shielding an existing VM, you would attach this helper VHDX to the VM and use PowerShell (e.g. ConvertTo-ShieldedVM or a script) to encrypt the VM’s OS disk using the helper. The helper boots in place of the VM’s OS, uses the PDK to apply BitLocker and the unattend to the OS disk, then shuts down. The VM is then switched to boot from its now-encrypted OS disk with a virtual TPM. (Note: Each initialized helper VHDX is typically one-time-use for a given VM; if you need to shield multiple VMs manually, create or copy a fresh helper disk for each to avoid BitLocker key reuse). Step 6: Prepare the Template Disk and PDK on the Host Copy the VHDX and PDK: Transfer the sealed template .vhdx and the .pdk file to the Hyper‑V host (or a cluster shared volume if the host is part of a Hyper‑V cluster). For example, place them in C:\ShieldedVM\templates\ on the host. This ensures the host can read the files during VM provisioning. Verify File Trust: (Optional) Double-check that the template disk’s signature is recognized by the tenant’s shielding data. The template’s .vsc file (volume signature catalog) should have been used when creating the PDK, so the PDK will “trust” that specific template hash. Also verify that the HGS guardian in the PDK matches your fabric’s HGS public key. These must align, or the VM provisioning will be rejected by HGS. Note: The PDK is encrypted and cannot be opened by the fabric admin as it’s designed so that only HGS (and the VM owner) can decrypt its contents. The host will use it as-is during provisioning. Make sure you do not modify or expose the PDK’s contents. PowerShell to finalize the shielded VM setup. Set up the key protector on the existing VM. For a clean process, you can use New-ShieldedVM on the guarded host: New-ShieldedVM -Name "Finance-App1" ` -TemplateDiskPath "C:\ShieldedVM\Templates\WS2025-ShieldedTemplate.vhdx" ` -ShieldingDataFilePath "C:\ShieldedVM\Templates\TenantShieldingData.pdk" -Wait This single command will create a new VM named “Finance-App1” using the specified template disk and shielding data file. It automatically configures the VM’s security settings: attaches a vTPM, injects the Key Protector (from the PDK) into the VM’s security settings, and attaches the shielding helper disk to boot and apply the unattend. The -Wait flag tells PowerShell to wait until provisioning is complete before returning. Note: Ensure the VM name is unique in your Hyper-V inventory. The New-ShieldedVM cmdlet requires the GuardedFabricTools module and will fail if the host isn’t a guarded host or if guardians are not properly configured. It uses the host’s configured HGS connection to request keys when provisioning. If your shielding data’s unattend file included placeholders for unique settings (for example, a static IP address, custom computer name, etc.), you can supply those values with the -SpecializationValues parameter on New-ShieldedVM. This takes a hashtable mapping the placeholder keys to actual values. For instance: $specVals = @{ "@ComputerName@" = "Finance-App1" "@IP4Addr-1@" = "10.0.0.50/24" "@Gateway-1@" = "10.0.0.1" } New-ShieldedVM -Name "Finance-App1" -TemplateDiskPath C:\ShieldedVM\Templates\WS2025-ShieldedTemplate.vhdx ` -ShieldingDataFilePath C:\ShieldedVM\Templates\TenantShieldingData.pdk -SpecializationValues $specVals -Wait This would replace placeholders like @ComputerName@ in the unattend with “Finance-App1”, etc. Use this only if the unattend (inside the PDK) was set up with such tokens. In many cases, the shielding data might already contain all required settings, so specialization values are optional. Step 7: Monitoring Provisioning and First Boot Once the shielded VM deployment is initiated (either by WAC or PowerShell), the provisioning process begins on the guarded host. This process is automatic and involves several stages behind the scenes: The host registers a new Key Protector for the VM (containing the VM’s BitLocker key, sealed to the VM’s virtual TPM and the fabric’s HGS). It then contacts the HGS. HGS verifies the host’s health (attestation) and, if the host is authorized and healthy, releases the key protector to the host. The VM is initially started using a temporary shielding helper OS (often a small utility VHD). This helper OS boots inside the new VM and uses the unattend file from the PDK to configure the main OS disk. It injects the administrator password, domain or network settings, enables RDP/WinRM, and then finalizes BitLocker encryption of the VM’s OS volume using the VM’s vTPM. This encryption locks the OS disk so it can only be decrypted by that VM’s vTPM (which in turn is only released by HGS to trusted hosts). When specialization is complete, the VM will shut down automatically. This shutdown is a signal that provisioning is finished. The helper disk is then automatically detached, and the VM is now fully shielded. As an administrator, you should monitor this process to know when the VM is ready: In Windows Admin Center’s VM list, you may see the VM’s state change (it might show as “Off” or “Stopped” after the provisioning shutdown). You may not get a detailed status in WAC during provisioning. Refresh the view to see if the VM has turned off after a few minutes. Using PowerShell, you can query the status: run Get-ShieldedVMProvisioningStatus -VMName <Name> on the guarded host to check progress. This cmdlet shows stages or any errors during provisioning. (If the provisioning fails, the cmdlet or Hyper-V event logs will show error details. Common causes include guardian mismatches or unattend errors.) Once the VM has shut down indicating success, you can proceed to start it normally. In WAC, select the VM and click Start (or use Start-VM -Name <Name> in PowerShell). The VM will boot its now-configured OS. On first boot, it will go through final OS specialization (the standard Sysprep specialize/pass completion). Step 8: Post-Deployment Access and Management Your new VM is now running as a shielded VM. Key points for management: Limited Host Access: Because it’s shielded, the Hyper-V host admin cannot view the VM’s console or use PowerShell Direct on this VM. In WAC (or Hyper-V Manager), if you try to connect to the VM’s console, it will be blocked (you might see a black screen or an error). This is expected as shielded VMs are isolated from host interference. All management must be done through the network. Accessing the VM: Use the credentials set in the unattend/PDK to log on to the VM via Remote Desktop (RDP) or another remote method (e.g. PowerShell Remoting). Ensure the VM is connected to a network and has an IP (via DHCP or the unattend’s settings). The unattend should have enabled RDP or WinRM as configured earlier. For example, if the PDK joined the VM to a domain, you can RDP with a domain account; if not, use the local Administrator and the password from the shielding data. Verify Shielded Status: In WAC’s inventory, the VM should show as a generation 2 VM with a TPM. You can confirm it’s shielded by checking VM’s Security settings (they will show that the VM is using a Key Protector and is shielded, often the UI will have those options greyed-out/enforced). You can also use PowerShell: Get-VMSecurity -VMName <Name>. It should show Shielded: True and list the Key Protector ID, etc. Routine Management: You can manage the VM (start/stop/reset) in WAC like any other VM. Backups, replication, etc., should be done with shielded VM-compatible methods (e.g. use Hyper-V checkpoints or backup APIs as the VM’s disks are encrypted but manageable through Hyper-V). Fabric admins cannot alter the VM’s settings that would compromise its security (for instance, you cannot remove the vTPM or turn off shielding without the VM owner’s consent). Further Reading: Install HGS in a new forest | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-install-hgs-default Guarded fabric and shielded VMs | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-and-shielded-vms-top-node Capture TPM-mode information required by HGS | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-tpm-trusted-attestation-capturing-hardware Guarded host prerequisites | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-guarded-host-prerequisites Review HGS prerequisites | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-prepare-for-hgs Create a Windows shielded VM template disk | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-create-a-shielded-vm-template Shielded VMs for tenants - Creating shielding data to define a shielded VM | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-tenant-creates-shielding-data Shielded VMs - Preparing a VM Shielding Helper VHD | https://learn.microsoft.com/en-us/windows-server/security/guarded-fabric-shielded-vm/guarded-fabric-vm-shielding-helper-vhd
OrinThomas
Sep 02, 2025 Place ITOps Talk Blog
230Views
0likes
0Comments
Step-by-Step: How to work with Group Managed Service Accounts (gMSA)
Services Accounts are recommended to use when install application or services in infrastructure. It is dedicated account with specific privileges which use to run services, batch jobs, management tasks. In most of the infrastructures, service accounts are typical user accounts with “Password never expire” option. Since these service accounts are not been use regularly, Administrators have to keep track of these accounts and their credentials. I have seen in many occasions where engineers face in to issues due to outdated or misplace service account credential details. Pain of it is, if you reset the password of service accounts, you will need to update services, databases, application settings to get application or services up and running again. Apart from it Engineers also have to manage service principle names (SPN) which helps to identify service instance uniquely.
Dishan_Francis
Jul 28, 2025 Place ITOps Talk Blog
138KViews
6likes
15Comments
Hyper-V Virtual TPMs, Certificates, VM Export and Migration
Virtual Trusted Platform Modules (vTPM) in Hyper-V allow you to run guest operating systems, such as Windows 11 or Windows Server 2025 with security features enabled. One of the challenges of vTPMs is that they rely on certificates on the local Hyper-V server. Great if you’re only running the VM with the vTPM on that server, but a possible cause of issues if you want to move that VM to another server. In this article I’ll show you how to manage the certificates that are associated with vTPMs so that you’ll be able to export or move VMs that use them, such as Windows 11 VMs, to any prepared Hyper-V host you manage. When a vTPM is enabled on a Generation 2 virtual machine, Hyper-V automatically generates a pair of self-signed certificates on the host where the VM resides. These certificates are specifically named: "Shielded VM Encryption Certificate (UntrustedGuardian)(ComputerName)" "Shielded VM Signing Certificate (UntrustedGuardian)(ComputerName)". These certificates are stored in a unique local certificate store on the Hyper-V host named "Shielded VM Local Certificates". By default, these certificates are provisioned with a validity period of 10 years. For a vTPM-enabled virtual machine to successfully live migrate and subsequently start on a new Hyper-V host, the "Shielded VM Local Certificates" (both the Encryption and Signing certificates) from the source host must be present and trusted on all potential destination Hyper-V hosts. Exporting vTPM related certificates. You can transfer certificates from one Hyper-V host to another using the following procedure: On the source Hyper-V host, open mmc.exe. From the "File" menu, select "Add/Remove Snap-in..." In the "Add or Remove Snap-ins" window, select "Certificates" and click "Add." Choose "Computer account" and then "Local Computer". Navigate through the console tree to "Certificates (Local Computer) > Personal > Shielded VM Local Certificates". Select both the "Shielded VM Encryption Certificate" and the "Shielded VM Signing Certificate." Right-click the selected certificates, choose "All Tasks," and then click "Export". In the Certificate Export Wizard, on the "Export Private Key" page, select "Yes, export the private key". The certificates are unusable for their intended purpose without their associated private keys. Select "Personal Information Exchange - PKCS #12 (.PFX)" as the export file format. Select "Include all certificates in the certification path if possible". Provide a strong password to protect the PFX file. This password will be required during the import process. To perform this process using the command line, display details of the certificates in the "Shielded VM Local Certificates" store, including their serial numbers. certutil -store "Shielded VM Local Certificates" Use the serial numbers to export each certificate, ensuring the private key is included. Replace <Serial_Number_Encryption_Cert> and <Serial_Number_Signing_Cert> with the actual serial numbers, and "YourSecurePassword" with a strong password: certutil -exportPFX -p "YourSecurePassword" "Shielded VM Local Certificates" <Serial_Number_Encryption_Cert> C:\Temp\VMEncryption.pfx certutil -exportPFX -p "YourSecurePassword" "Shielded VM Local Certificates" <Serial_Number_Signing_Cert> C:\Temp\VMSigning.pfx Importing vTPM related certificates To import these certificates on a Hyper-V host that you want to migrate a vTPM enabled VM to, perform the following steps: Transfer the exported PFX files to all Hyper-V hosts that will serve as potential live migration targets. On each target host, open mmc.exe and add the "Certificates" snap-in for the "Computer account" (Local Computer). Navigate to "Certificates (Local Computer) > Personal." Right-click the "Personal" folder, choose "All Tasks," and then click "Import". Proceed through the Certificate Import Wizard. Ensure the certificates are placed in the "Shielded VM Local Certificates" store. After completing the wizard, verify that both the Encryption and Signing certificates now appear in the "Shielded VM Local Certificates" store on the new host. You can accomplish the same thing using PowerShell with the following command: Import-PfxCertificate -FilePath "C:\Backup\CertificateName.pfx" -CertStoreLocation "Cert:\LocalMachine\Shielded VM Local Certificates" -Password (ConvertTo-SecureString -String "YourPassword" -Force -AsPlainText) Updating vTPM related certificates. Self signed vTPM certificates automatically expire after 10 years. Resetting the key protector for a vTPM-enabled VM in Hyper-V allows you change or renew the underlying certificates (especially if the private key changes). Here are the requirements and considerations around this process: The VM must be in an off state to change security settings or reset the key protector The host must have the appropriate certificates (including private keys) in the "Shielded VM Local Certificates" store. If the private key is missing, the key protector cannot be set or validated. Always back up the VM and existing certificates before resetting the key protector, as this process can make previously encrypted data inaccessible if not performed correctly. The VM must be at a supported configuration version (typically version 7.0 or higher) to support vTPM and key protector features. To save the Current Key Protector: On the source Hyper-V host, retrieve the current Key Protector for the VM and save it to a file. Get-VMKeyProtector -VMName 'VM001' | Out-File '.\VM001.kp' To reset the key protector with a new local key protector: Set-VMKeyProtector -VMName "<VMNAME>" -NewLocalKeyProtector This command instructs Hyper-V to generate a new key protector using the current local certificates. After resetting, enable vTPM if needed: Enable-VMTPM -VMName "<VMNAME>" It is important to note that if an incorrect Key Protector is applied to the VM, it may fail to start. In such cases, the Set-VMKeyProtector -RestoreLastKnownGoodKeyProtector cmdlet can be used to revert to the last known working Key Protector. More information: Set-VMKeyProtector: https://learn.microsoft.com/en-us/powershell/module/hyper-v/set-vmkeyprotector
OrinThomas
Jul 15, 2025 Place ITOps Talk Blog
4.7KViews
5likes
5Comments
Using OSConfig to manage Windows Server 2025 security baselines
OSConfig is a security configuration and compliance management tool introduced as a PowerShell module for use with Windows Server 2025. It enables you to enforce security baselines, automate compliance, and prevent configuration drift on Windows Server 2025 computers. OSConfig has the following requirements: Windows Server 2025 (OSConfig is not supported on earlier versions) PowerShell version 5.1 or higher Administrator privileges OSConfig is available as a module from the PowerShell Gallery. You install it using the following command Install-Module -Name Microsoft.OSConfig -Scope AllUsers -Repository PSGallery -Force If prompted to install or update the NuGet provider, type Y and press Enter. You can verify that the module is installed with: Get-Module -ListAvailable -Name Microsoft.OSConfig You can ensure that you have an up-to-date version of the module and the baselines by running the following command: Update-Module -Name Microsoft.OSConfig To check which OSConfig cmdlets are available, run: Get-Command -Module Microsoft.OSConfig Applying Security Baselines OSConfig includes predefined security baselines tailored for different server roles: Domain Controller, Member Server, and Workgroup Member. These baselines enforce over 300 security settings, such as TLS 1.2+, SMB 3.0+, credential protections, and more. Server Role Command Domain Controller Set-OSConfigDesiredConfiguration -Scenario SecurityBaseline/WS2025/DomainController -Default Member Server Set-OSConfigDesiredConfiguration -Scenario SecurityBaseline/WS2025/MemberServer -Default Workgroup Member Set-OSConfigDesiredConfiguration -Scenario SecurityBaseline/WS2025/WorkgroupMember -Default Secured Core Set-OSConfigDesiredConfiguration -Scenario SecuredCore -Default Defender Antivirus Set-OSConfigDesiredConfiguration -Scenario Defender/Antivirus -Default To view compliance from a PowerShell session, run the following command, specifying the appropriate baseline: Get-OSConfigDesiredConfiguration -Scenario SecurityBaseline/WS2025/MemberServer | ft Name, @{ Name = "Status"; Expression={$_.Compliance.Status} }, @{ Name = "Reason"; Expression={$_.Compliance.Reason} } -AutoSize -Wrap Whilst this PowerShell output gets the job done, you might find it easier to parse the report by using Windows Admin Center. You can access the security baseline compliance report by connecting to the server you’ve configured using OSConfig by selecting the Security Baseline tab of the Security blade. Another feature of OSConfig is drift control. It helps ensure that the system starts and remains in a known good security state. When you turn it on, OSConfig automatically corrects any system changes that deviate from the desired state. OSConfig makes the correction through a refresh task. This task runs every 4 hours by default which you can verify with the Get-OSConfigDriftControl cmdlet. You can reset how often drift control runs using the Set-OSConfigDriftControl cmdlet. For example, to set it to 45 minutes run the command: Set-OSConfigDriftControl -RefreshPeriod 45 Rather than just using the default included baselines, you can also customize baselines to suit your organizational needs. That’s more detail that I want to cover here, but if you want to know more, check out the information available in the GitHub repo associated with OSConfig. Find out more about OSConfig at the following links: https://learn.microsoft.com/en-us/windows-server/security/osconfig/osconfig-overview https://learn.microsoft.com/en-us/windows-server/security/osconfig/osconfig-how-to-configure-security-baselines
OrinThomas
Jul 14, 2025 Place ITOps Talk Blog
1.4KViews
3likes
4Comments
Azure Arc for IT Pros
Let have a look at how Azure Arc can empower IT Pros in their day-to-day tasks working in a hybrid or multicloud environment.
thomasmaurer
Jul 13, 2025 Place ITOps Talk Blog
24KViews
9likes
7Comments
How to In-Place Upgrade Windows Server 2008 R2 to Windows Server 2019
As you know the Windows Server 2008 and Windows Server 2008 R2 are out of support on January 14th, 2020. Customer will need to upgrade their Windows Server 2008 and Windows Server 2008 R2 to a newer version of Windows Server or migrate these servers to Microsoft Azure.
thomasmaurer
Jul 13, 2025 Place ITOps Talk Blog
455KViews
8likes
28Comments
Windows Server 2025 Hyper-V Workgroup Cluster with Certificate-Based Authentication
In this guide, we will walk through creating a 2-node or 4-node Hyper-V failover cluster where the nodes are not domain-joined, using mutual certificate-based authentication instead of NTLM or shared local accounts. Here we are going to leverage X.509 certificates for node-to-node authentication. If you don't use certificates, you can do this with NTLM, but we're avoiding that as NTLM is supported, but the general recommendation is that you deprecate it where you can. We can't use Kerberos because our nodes won't be domain joined. It's a lot easier to do Windows Server Clusters if everything is domain joined, but that's not what we're doing here because there are scenarios where people want each cluster node to be a standalone (probably why you are reading this article). Prerequisites and Environment Preparation Before diving into configuration, ensure the following prerequisites and baseline setup: Server OS and Roles: All cluster nodes must be running Windows Server 2025 (same edition and patch level). Install the latest updates and drivers on each node. Each node should have the Hyper-V role and Failover Clustering feature available (we will install these via PowerShell shortly). Workgroup configuration: Nodes must be in a workgroup. The nodes should be in the same workgroup name. All nodes should share a common DNS suffix so that they can resolve each other’s FQDNs. For example, if your chosen suffix is mylocal.net, ensure each server’s FQDN is NodeName.mylocal.net. Name Resolution: Provide a way for nodes to resolve each other’s names (and the cluster name). If you have no internal DNS server, use the hosts file on each node to map hostnames to IPs. At minimum, add entries for each node’s name (short and FQDN) and the planned cluster name (e.g. Cluster1 and Cluster1.mylocal.net) pointing to the cluster’s management IP address. Network configuration: Ensure a reliable, low-latency network links all nodes. Ideally use at least two networks or VLANs: one for management/cluster communication and one dedicated for Live Migration traffic. This improves performance and security (live migration traffic can be isolated). If using a single network, ensure it is a trusted, private network since live migration data is not encrypted by default. Assign static IPs (or DHCP reservations) on the management network for each node and decide on an unused static IP for the cluster itself. Verify that necessary firewall rules for clustering are enabled on each node (Windows will add these when the Failover Clustering feature is installed, but if your network is classified Public, you may need to enable them or set the network location to Private). Time synchronization: Consistent time is important for certificate trust. Configure NTP on each server (e.g. pointing to a reliable internet time source or a local NTP server) so that system clocks are in sync. Shared storage: Prepare the shared storage that all nodes will use for Hyper-V. This can be an iSCSI target or an SMB 3.0 share accessible to all nodes. For iSCSI or SAN storage, connect each node to the iSCSI target (e.g. using the Microsoft iSCSI Initiator) and present the same LUN(s) to all nodes. Do not bring the disks online or format them on individual servers – leave them raw for the cluster to manage. For an SMB 3 file share, ensure the share is configured for continuous availability. Note: A file share witness for quorum is not supported in a workgroup cluster, so plan to use a disk witness or cloud witness instead. Administrative access: You will need Administrator access to each server. While we will avoid using identical local user accounts for cluster authentication, you should still have a way to log into each node (e.g. the built-in local Administrator account on each machine). If using Remote Desktop or PowerShell Remoting for setup, ensure you can authenticate to each server (we will configure certificate-based WinRM for secure remote PowerShell). The cluster creation process can be done by running commands locally on each node to avoid passing NTLM credentials. Obtaining and Configuring Certificates for Cluster Authentication The core of our setup is the use of mutual certificate-based authentication between cluster nodes. Each node will need an X.509 certificate that the others trust. We will outline how to use an internal Active Directory Certificate Services (AD CS) enterprise CA to issue these certificates, and mention alternatives for test environments. We are using AD CS even though the nodes aren't domain joined. Just because the nodes aren't members of the domain doesn't mean you can't use an Enterprise CA to issue certificates, you just have to ensure the nodes are configured to trust the CA's certs manually. Certificate Requirements and Template Configuration For clustering (and related features like Hyper-V live migration) to authenticate using certificates, the certificates must meet specific requirements: Key Usage: The certificate should support digital signature and key encipherment (these are typically enabled by default for SSL certificates). Enhanced Key Usage (EKU): It must include both Client Authentication and Server Authentication EKUs. Having both allows the certificate to be presented by a node as a client (when initiating a connection to another node) and as a server (when accepting a connection). For example, in the certificate’s properties you should see Client Authentication (1.3.6.1.5.5.7.3.2) and Server Authentication (1.3.6.1.5.5.7.3.1) listed under “Enhanced Key Usage”. Subject Name and SAN: The certificate’s subject or Subject Alternative Name should include the node’s DNS name. It is recommended that the Subject Common Name (CN) be set to the server’s fully qualified DNS name (e.g. Node1.mylocal.net). Also include the short hostname (e.g. Node1) in the Subject Alternative Name (SAN) extension (DNS entries). If you have already chosen a cluster name (e.g. Cluster1), include the cluster’s DNS name in the SAN as well. This ensures that any node’s certificate can be used to authenticate connections addressed to the cluster’s name or the node’s name. (Including the cluster name in all node certificates is optional but can facilitate management access via the cluster name over HTTPS, since whichever node responds will present a certificate that matches the cluster name in SAN.) Trust: All cluster nodes must trust the issuer of the certificates. If using an internal enterprise CA, this means each node should have the CA’s root certificate in its Trusted Root Certification Authorities store. If you are using a standalone or third-party CA, similarly ensure the root (and any intermediate CA) is imported into each node’s Trusted Root store. Next, on your enterprise CA, create a certificate template for the cluster node certificates (or use an appropriate existing template): Template basis: A good starting point is the built-in “Computer” or “Web Server” template. Duplicate the template so you can modify settings without affecting defaults. General Settings: Give the new template a descriptive name (e.g. “Workgroup Cluster Node”). Set the validity period (e.g. 1 or 2 years – plan a manageable renewal schedule since these certs will need renewal in the future). Compatibility: Ensure it’s set for at least Windows Server 2016 or higher for both Certification Authority and Certificate Recipient to support modern cryptography. Subject Name: Since our servers are not domain-joined (and thus cannot auto-enroll with their AD computer name), configure the template to allow subject name supply in the request. In the template’s Subject Name tab, choose “Supply in request” (this allows us to specify the SAN and CN when we request the cert on each node). Alternatively, use the SAN field in the request – modern certificate requests will typically put the FQDN in the SAN. Extensions: In the Extensions tab, edit Key Usage to ensure it includes Digital Signature and Key Encipherment (these should already be selected by default for Computer templates). Then edit Extended Key Usage and make sure Client Authentication and Server Authentication are present. If using a duplicated Web Server template, add Client Authentication EKU; if using Computer template, both EKUs should already be there. Also enable private key export if your policy requires (though generally private keys should not be exported; here each node will have its own cert so export is not necessary except for backup purposes). Security: Allow the account that will be requesting the certificate to enroll. Since the nodes are not in AD, you might generate the CSR on each node and then submit it via an admin account. One approach is to use a domain-joined management PC or the CA server itself to submit the CSR, so ensure domain users (or a specific user) have Enroll permission on the template. Publish the template: On the CA, publish the new template so it is available for issuing. Obtaining Certificates from the Enterprise CA Now for each cluster node, request a certificate from the CA using the new template. To do this, on each node, create an INF file describing the certificate request. For example, Node1.inf might specify the Subject as CN=Node1.mylocal.net and include SANs for Node1.mylocal.net, Node1, Cluster1.mylocal.net, Cluster1. Also specify in the INF that you want Client and Server Auth EKUs (or since the template has them by default, it might not be needed to list them explicitly). Then run: certreq -new Node1.inf Node1.req This generates a CSR file (Node1.req). Transfer this request to a machine where you can reach the CA (or use the CA web enrollment). Submit the request to your CA, specifying the custom template. For example: certreq -submit -attrib "CertificateTemplate:Workgroup Cluster Node" Node1.req Node1.cer (Or use the Certification Authority MMC to approve the pending request.) This yields Node1.cer. Finally, import the issued certificate on Node1: certreq -accept Node1.cer This will automatically place the certificate in the Local Machine Personal store with the private key. Using Certificates MMC (if the CA web portal is available): On each node, open Certificates (Local Computer) MMC and under Personal > Certificates, initiate New Certificate Request. Use the Active Directory Enrollment Policy if the node can reach the CA’s web enrollment (even if not domain-joined, you can often authenticate with a domain user account for enrollment). Select the custom template and supply the DNS names. Complete the enrollment to obtain the certificate in the Personal store. On a domain-joined helper system: Alternatively, use a domain-joined machine to request on behalf of the node (using the “Enroll on behalf” feature with an Enrollment Agent certificate, or simply request and then export/import). This is more complex and usually not needed unless policy restricts direct enrollment. After obtaining each certificate, verify on the node that it appears in Certificates (Local Computer) > Personal > Certificates. The Issued To should be the node’s FQDN, and on the Details tab you should see the required EKUs and SAN entries. Also import the CA’s Root CA certificate into Trusted Root Certification Authorities on each node (the certreq -accept step may do this automatically if the chain is provided; if not, manually import the CA root). A quick check using the Certificates MMC or PowerShell can confirm trust. For example, to check via PowerShell: Get-ChildItem Cert:\LocalMachine\My | Where-Object {$_.Subject -like "*Node1*"} | Select-Object Subject, EnhancedKeyUsageList, NotAfter Make sure the EnhancedKeyUsageList shows both Client and Server Authentication and that NotAfter (expiry) is a reasonable date. Also ensure no errors about untrusted issuer – the Certificate status should show “This certificate is OK”. Option: Self-Signed Certificates for Testing For a lab or proof-of-concept (where an enterprise CA is not available), you can use self-signed certificates. The key is to create a self-signed cert that includes the proper names and EKUs, and then trust that cert across all nodes. Use PowerShell New-SelfSignedCertificate with appropriate parameters. For example, on Node1: $cert = New-SelfSignedCertificate -DnsName "Node1.mylocal.net", "Node1", "Cluster1.mylocal.net", "Cluster1" ` -CertStoreLocation Cert:\LocalMachine\My ` -KeyUsage DigitalSignature, KeyEncipherment ` -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.1;1.3.6.1.5.5.7.3.2") This creates a certificate for Node1 with the specified DNS names and both ServerAuth/ClientAuth EKUs. Repeat on Node2 (adjusting names accordingly). Alternatively, you can generate a temporary root CA certificate and then issue child certificates to each node (PowerShell’s -TestRoot switch simplifies this by generating a root and end-entity cert together). If you created individual self-signed certs per node, export each node’s certificate (without the private key) and import it into the Trusted People or Trusted Root store of the other nodes. (Trusted People works for peer trust of specific certs; Trusted Root works if you created a root CA and issued from it). For example, if Node1 and Node2 each have self-signed certs, import Node1’s cert as a Trusted Root on Node2 and vice versa. This is required because self-signed certs are not automatically trusted. Using CA-issued certs is strongly recommended for production. Self-signed certs should only be used in test environments, and if used, monitor and manually renew them before expiration (since there’s no CA to do it). A lot of problems have occurred in production systems because people used self signed certs and forgot that they expire. Setting Up WinRM over HTTPS for Remote Management With certificates in place, we can configure Windows Remote Management (WinRM) to use them. WinRM is the service behind PowerShell Remoting and many remote management tools. By default, WinRM uses HTTP (port 5985) and authenticates via Kerberos or NTLM. In a workgroup scenario, NTLM over HTTP would be used – we want to avoid that. Instead, we will enable WinRM over HTTPS (port 5986) with our certificates, providing encryption and the ability to use certificate-based authentication for management sessions. Perform these steps on each cluster node: Verify certificate for WinRM: WinRM requires a certificate in the Local Computer Personal store that has a Server Authentication EKU and whose Subject or SAN matches the hostname. We have already enrolled such a certificate for each node. Double-check that the certificate’s Issued To (CN or one of the SAN entries) exactly matches the hostname that clients will use (e.g. the FQDN). If you plan to manage via short name, ensure the short name is in SAN; if via FQDN, that’s covered by CN or SAN. The certificate must not be expired or revoked, and it should be issued by a CA that the clients trust (not self-signed unless the client trusts it). Enable the HTTPS listener: Open an elevated PowerShell on the node and run: winrm quickconfig -transport:https This command creates a WinRM listener on TCP 5986 bound to the certificate. If it says no certificate was found, you may need to specify the certificate manually. You can do so with: # Find the certificate thumbprint (assuming only one with Server Auth) $thumb = (Get-ChildItem Cert:\LocalMachine\My | Where-Object {$_.EnhancedKeyUsageList -match "Server Authentication"} | Select-Object -First 1 -ExpandProperty Thumbprint) New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbprint $thumb -Force Verify listeners with: winrm enumerate winrm/config/listener You should see an HTTPS listener with hostname, listening on 5986, and the certificate’s thumbprint. WinRM will automatically choose a certificate that meets the criteria (if multiple are present, it picks the one with CN matching machine name, so ideally use a unique cert to avoid ambiguity). Disable unencrypted/HTTP access (optional but recommended): Since we want all remote management encrypted and to eliminate NTLM, you can disable the HTTP listener. Run: Remove-WSManInstance -ResourceURI winrm/config/Listener -SelectorSet @{Address="*", Transport="HTTP"} This ensures WinRM is only listening on HTTPS. Also, you may configure the WinRM service to reject unencrypted traffic and disallow Basic authentication to prevent any fallback to insecure methods: winrm set winrm/config/service '@{AllowUnencrypted="false"}' winrm set winrm/config/service/auth '@{Basic="false"}' (By default, AllowUnencrypted is false anyway when HTTPS is used, and Basic is false unless explicitly enabled.) TrustedHosts (if needed): In a workgroup, WinRM won’t automatically trust hostnames for authentication. However, when using certificate authentication, the usual TrustedHosts requirement may not apply in the same way as for NTLM/Negotiate. If you plan to authenticate with username/password over HTTPS (e.g. using Basic or default CredSSP), you will need to add the other nodes (or management station) to the TrustedHosts list on each node. This isn’t needed for the cluster’s internal communication (which uses certificates via clustering, not WinRM), but it might be needed for your remote PowerShell sessions depending on method. To allow all (not recommended for security), you could do: Set-Item WSMan:\localhost\Client\TrustedHosts -Value "*" Or specify each host: Set-Item WSMan:\localhost\Client\TrustedHosts -Value "Node1,Node2,Cluster1" This setting allows the local WinRM client to talk to those remote names without Kerberos. If you will use certificate-based authentication for WinRM (where the client presents a cert instead of username/password), TrustedHosts is not required – certificate auth doesn’t rely on host trust in the same way. (Optional) Configure certificate authentication for admin access: One of the benefits of HTTPS listener is you can use certificate mapping to log in without a password. For advanced users, you can issue a client certificate for yourself (with Client Authentication EKU), then configure each server to map that cert to a user (for example, map to the local Administrator account). This involves creating a mapping entry in winrm/config/service/certmapping. For instance: # Example: map a client cert by its subject to a local account winrm create winrm/config/service/certmapping @{CertificateIssuer= "CN=YourCA"; Subject="CN=AdminUserCert"; Username="Administrator"; Password="<adminPassword>"; Enabled="true"} Then from your management machine, you can use that certificate to authenticate. While powerful, this goes beyond the core cluster setup, so we won’t detail it further. Without this, you can still connect to the nodes using Enter-PSSession -ComputerName Node1 -UseSSL -Credential Node1\Administrator (which will prompt for the password but send it safely over the encrypted channel). At this point, we have each node prepared with a trusted certificate and WinRM listening securely. Test the connectivity: from one node, try to start a PowerShell remote session to the other using HTTPS. For example, on Node1 run: Test-WsMan Node2 -UseSSL Enter-PSSession -ComputerName Node2 -UseSSL -Credential Node2\Administrator You should connect without credential errors or warnings (you may get a certificate trust prompt if the client machine doesn’t trust the server cert — make sure the CA root is in the client’s trust store as well). Once you can manage nodes remotely over HTTPS, you’re ready to create the cluster. Installing the Hyper-V and Failover Clustering Roles All cluster nodes need the Hyper-V role (for running VMs) and the Failover Clustering feature. We will use PowerShell to install these simultaneously on each server. On each node: Open an elevated PowerShell (locally or via your new WinRM setup) and run: Install-WindowsFeature -Name Failover-Clustering, Hyper-V -IncludeManagementTools -Restart This installs the Hyper-V hypervisor, the clustering feature, and management tools (including the Failover Cluster Manager and Hyper-V Manager GUI, and PowerShell modules). The server will restart if Hyper-V was not previously enabled (we include -Restart for convenience). After reboot, run the command on the next node (if doing it remotely, do one at a time). Alternatively, use the Server Manager GUI or Install-WindowsFeature without -Restart and reboot manually. After all nodes are back up, verify the features: Get-WindowsFeature -Name Hyper-V, Failover-Clustering It should show both as Installed. Also confirm the Failover Clustering PowerShell module is available (Get-Module -ListAvailable FailoverClusters) and the Cluster service is installed (though not yet configured). Cluster service account: Windows Server 2016+ automatically creates a local account called CLIUSR used by the cluster service for internal communication. Ensure this account was created (Computer Management > Users). We won’t interact with it directly, but be aware it exists. Do not delete or disable CLIUSR – the cluster uses it alongside certificates for bootstrapping. (All cluster node communications will now use either Kerberos or certificate auth; NTLM is not needed in WS2019+ clusters.) Now that you've backflipped and shenaniganed with all the certificates, you can actually get around to building the cluster. Creating the Failover Cluster (Using DNS as the Access Point) Here we will create the cluster and add nodes to it using PowerShell. The cluster will use a DNS name for its administrative access point (since there is no Active Directory for a traditional cluster computer object). The basic steps are: Validate the configuration (optional but recommended). Create the cluster (initially with one node to avoid cross-node authentication issues). Join additional node(s) to the cluster. Configure cluster networking, quorum, and storage (CSV). Validate the Configuration (Cluster Validation) It’s good practice to run the cluster validation tests to catch any misconfiguration or hardware issues before creating the cluster. Microsoft supports a cluster only if it passes validation or if any errors are acknowledged as non-critical. Run the following from one of the nodes (this will reach out to all nodes): Test-Cluster -Node Node1.mylocal.net, Node2.mylocal.net Replace with your actual node names (include all 2 or 4 nodes). The cmdlet will run a series of tests (network, storage, system settings). Ensure that all tests either pass or only have warnings that you understand. For example, warnings about “no storage is shared among all nodes” are expected if you haven’t yet configured iSCSI or if using SMB (you can skip storage tests with -Skip Storage if needed). If critical tests fail, resolve those issues (networking, disk visibility, etc.) before proceeding. Create the Cluster (with the First Node) On one node (say Node1), use the New-Cluster cmdlet to create the cluster with that node as the first member. By doing it with a single node initially, we avoid remote authentication at cluster creation time (no need for Node1 to authenticate to Node2 yet): New-Cluster -Name "Cluster1" -Node Node1 -StaticAddress "10.0.0.100" -AdministrativeAccessPoint DNS Here: -Name is the intended cluster name (this will be the name clients use to connect to the cluster, e.g. for management or as a CSV namespace prefix). We use “Cluster1” as an example. -Node Node1 specifies which server to include initially (Node1’s name). -StaticAddress sets the cluster’s IP address (choose one in the same subnet that is not in use; this IP will be brought online as the “Cluster Name” resource). In this example 10.0.0.100 is the cluster IP. -AdministrativeAccessPoint DNS indicates we’re creating a DNS-only cluster (no AD computer object). This is the default in workgroup clusters, but we specify it explicitly for clarity. The command will proceed to create the cluster service, register the cluster name in DNS (if DNS is configured and dynamic updates allowed), and bring the core cluster resources online. It will also create a cluster-specific certificate (self-signed) for internal use if needed, but since we have our CA-issued certs in place, the cluster may use those for node authentication. Note: If New-Cluster fails to register the cluster name in DNS (common in workgroup setups), you might need to create a manual DNS A record for “Cluster1” pointing to 10.0.0.100 in whatever DNS server the nodes use. Alternatively, add “Cluster1” to each node’s hosts file (as we did in prerequisites). This ensures that the cluster name is resolvable. The cluster will function without AD, but it still relies on DNS for name resolution of the cluster name and node names. At this point, the cluster exists with one node (Node1). You can verify by running cluster cmdlets on Node1, for example: Get-Cluster (should list “Cluster1”) and Get-ClusterNode (should list Node1 as up). In Failover Cluster Manager, you could also connect to “Cluster1” (or to Node1) and see the cluster. Add Additional Nodes to the Cluster Now we will add the remaining node(s) to the cluster: On each additional node, run the following (replace “Node2” with the name of that node and adjust cluster name accordingly): Add-ClusterNode -Cluster Cluster1 -Name Node2 Run this on Node2 itself (locally). This instructs Node2 to join the cluster named Cluster1. Because Node2 can authenticate the cluster (Node1) via the cluster’s certificate and vice versa, the join should succeed without prompting for credentials. Under the hood, the cluster service on Node2 will use the certificate (and CLIUSR account) to establish trust with Node1’s cluster service. Repeat the Add-ClusterNode command on each additional node (Node3, Node4, etc. one at a time). After each join, verify by running Get-ClusterNode on any cluster member – the new node should show up and status “Up”. If for some reason you prefer a single command from Node1 to add others, you could use: # Run on Node1: Add-ClusterNode -Name Node2, Node3 -Cluster Cluster1 This would attempt to add Node2 and Node3 from Node1. It may prompt for credentials or require TrustedHosts if no common auth is present. Using the local Add-ClusterNode on each node avoids those issues by performing the action locally. Either way, at the end all nodes should be members of Cluster1. Configure Quorum (Witness) Quorum configuration is critical, especially with an even number of nodes. The cluster will already default to Node Majority (no witness) or may try to assign a witness if it finds eligible storage. Use a witness to avoid a split-brain scenario. If you have a small shared disk (LUN) visible to both nodes, that can be a Disk Witness. Alternatively, use a Cloud Witness (Azure). To configure a disk witness, first make sure the disk is seen as Available Storage in the cluster, then run: Get-ClusterAvailableDisk | Add-ClusterDisk Set-ClusterQuorum -Cluster Cluster1 -NodeAndDiskMajority 0 /disk:<DiskResourceName> (Replace <DiskResourceName> with the name or number of the disk from Get-ClusterResource). Using Failover Cluster Manager, you can run the Configure Cluster Quorum wizard and select “Add a disk witness”. If no shared disk is available, the Cloud Witness is an easy option (requires an Azure Storage account and key). For cloud witness: Set-ClusterQuorum -Cluster Cluster1 -CloudWitness -AccountName "<StorageAccount>" -AccessKey "<Key>" Do not use a File Share witness – as noted earlier, file share witnesses are not supported in workgroup clusters because the cluster cannot authenticate to a remote share without AD. A 4-node cluster can sustain two node failures if properly configured. It’s recommended to also configure a witness for even-number clusters to avoid a tie (2–2) during a dual-node failure scenario. A disk or cloud witness is recommended (same process as above). With 4 nodes, you would typically use Node Majority + Witness. The cluster quorum wizard can automatically choose the best quorum config (typically it will pick Node Majority + Witness if you run the wizard and have a witness available). You can verify the quorum configuration with Get-ClusterQuorum. Make sure it lists the witness you configured (if any) and that the cluster core resources show the witness online. Add Cluster Shared Volumes (CSV) or Configure VM Storage Next, prepare storage for Hyper-V VMs. If using a shared disk (Block storage like iSCSI/SAN), after adding the disks to the cluster (they should appear in Storage > Disks in Failover Cluster Manager), you can enable Cluster Shared Volumes (CSV). CSV allows all nodes to concurrently access the NTFS/ReFS volume, simplifying VM placement and live migration. To add available cluster disks as CSV volumes: Get-ClusterDisk | Where-Object IsClustered -eq $true | Add-ClusterSharedVolume This will take each clustered disk and mount it as a CSV under C:\ClusterStorage\ on all nodes. Alternatively, right-click the disk in Failover Cluster Manager and choose Add to Cluster Shared Volumes. Once done, format the volume (if not already formatted) with NTFS or ReFS via any node (it will be accessible as C:\ClusterStorage\Volume1\ etc. on all nodes). Now this shared volume can store all VM files, and any node can run any VM using that storage. If using an SMB 3 share (NAS or file server), you won’t add this to cluster storage; instead, each Hyper-V host will connect to the SMB share directly. Ensure each node has access credentials for the share. In a workgroup, that typically means the NAS is also in a workgroup and you’ve created a local user on the NAS that each node uses (via stored credentials) – this is outside the cluster’s control. Each node should be able to New-SmbMapping or simply access the UNC path. Test access from each node (e.g. Dir \\NAS\HyperVShare). In Hyper-V settings, you might set the Default Virtual Hard Disk Path to the UNC or just specify the UNC when creating VMs. Note: Hyper-V supports storing VMs on SMB 3.0 shares with Kerberos or certificate-based authentication, but in a workgroup you’ll likely rely on a username/password for the share (which is a form of local account usage at the NAS). This doesn’t affect cluster node-to-node auth, but it’s a consideration for securing the NAS. Verify Cluster Status At this stage, run some quick checks to ensure the cluster is healthy: Get-Cluster – should show the cluster name, IP, and core resources online. Get-ClusterNode – all nodes should be Up. Get-ClusterResource – should list resources (Cluster Name, IP Address, any witness, any disks) and their state (Online). The Cluster Name resource will be of type “Distributed Network Name” since this is a DNS-only cluster. Use Failover Cluster Manager (you can launch it on one of the nodes or from RSAT on a client) to connect to “Cluster1”. Ensure you can see all nodes and storage. When prompted to connect, use <clustername> or <clusterIP> – with our certificate setup, it may be best to connect by cluster name (make sure DNS/hosts is resolving it to the cluster IP). If a certificate trust warning appears, it might be because the management station doesn’t trust the cluster node’s cert or you connected with a name not in the SAN. As a workaround, connect directly to a node in cluster manager (e.g. Node1), which then enumerates the cluster. Now you have a functioning cluster ready for Hyper-V workloads, with secure authentication between nodes. Next, we configure Hyper-V specific settings like Live Migration. Configuring Hyper-V for Live Migration in the Workgroup Cluster One major benefit introduced in Windows Server 2025 is support for Live Migration in workgroup clusters (previously, live migration required Kerberos and thus a domain). In WS2025, cluster nodes use certificates to mutually authenticate for live migration traffic. This allows VMs to move between hosts with no downtime even in the absence of AD. We will enable and tune live migration for our cluster. By default, the Hyper-V role might have live migration disabled (for non-clustered hosts). In a cluster, it may be auto-enabled when the Failover Clustering and Hyper-V roles are both present, but to ensure it it, run: Enable-VMMigration This enables the host to send/receive live migrations. In PowerShell, no output means success. (In Hyper-V Manager UI, this corresponds to ticking “Enable incoming and outgoing live migrations” in the Live Migrations settings.) In a workgroup, the only choice in UI would be CredSSP (since Kerberos requires domain). CredSSP means you must initiate the migration from a session where you are logged onto the source host so your credentials can be delegated. We cannot use Kerberos here, but the cluster’s internal PKU2U certificate mechanism will handle node-to-node auth for us when orchestrated via Failover Cluster Manager. No explicit setting is needed for cluster-internal certificate usage & Windows will use it automatically for the actual live migration operation. If you were to use PowerShell, the default MigrationAuthenticationType is CredSSP for workgroup. You can confirm (or set explicitly, though not strictly required): Set-VMHost -VirtualMachineMigrationAuthenticationType CredSSP (This can be done on each node; it just ensures the Hyper-V service knows to use CredSSP which aligns with our need to initiate migrations from an authenticated context.) If your cluster nodes were domain-joined, Windows Server 2025 enables Credential Guard which blocks CredSSP by default. In our case (workgroup), Credential Guard is not enabled by default, so CredSSP will function. Just be aware if you ever join these servers to a domain (or they were once joined to a domain before being demoted to a workgroup), you’d need to configure Kerberos constrained delegation or disable Credential Guard to use live migration. For security and performance, do not use the management network for VM migration if you have other NICs. We will designate the dedicated network (e.g. “LMNet” or a specific subnet) for migrations. You can configure this via PowerShell or Failover Cluster Manager. Using PowerShell, run the following on each node: # Example: allow LM only on 10.0.1.0/24 network (where 10.0.1.5 is this node's IP on that network) Set-VMMigrationNetwork 10.0.1.5 Set-VMHost -UseAnyNetworkForMigration $false The Set-VMMigrationNetwork cmdlet adds the network associated with the given IP to the allowed list for migrations. The second cmdlet ensures only those designated networks are used. Alternatively, if you have the network name or interface name, you might use Hyper-V Manager UI: under each host’s Hyper-V Settings > Live Migrations > Advanced Features, select Use these IP addresses for Live Migration and add the IP of the LM network interface. In a cluster, these settings are typically per-host. It’s a good idea to configure it identically on all nodes. Verify the network selection by running: Get-VMHost | Select -ExpandProperty MigrationNetworks. It should list the subnet or network you allowed, and UseAnyNetworkForMigration should be False. Windows can either send VM memory over TCP, compress it, or use SMB Direct (if RDMA is available) for live migration. By default in newer Windows versions, compression is used as it offers a balance of speed without special hardware. If you have a very fast dedicated network (10 Gbps+ or RDMA), you might choose SMB to leverage SMB Multichannel/RDMA for highest throughput. To set this: # Options: TCPIP, Compression, SMB Set-VMHost -VirtualMachineMigrationPerformanceOption Compression (Do this on each node; “Compression” is usually default on 2022/2025 Hyper-V.) If you select SMB, ensure your cluster network is configured to allow SMB traffic and consider enabling SMB encryption if security is a concern (SMB encryption will encrypt the live migration data stream). Note that if you enable SMB encryption or cluster-level encryption, it could disable RDMA on that traffic, so only enable it if needed, or rely on the network isolation as primary protection. Depending on your hardware, you may allow multiple VMs to migrate at once. The default is usually 2 simultaneous live migrations. You can increase this if you have capacity: Set-VMHost -MaximumVirtualMachineMigrations 4 -MaximumStorageMigrations 2 Adjust numbers as appropriate (and consider that cluster-level property (Get-Cluster).MaximumParallelMigrations might override host setting in a cluster). This setting can also be found in Hyper-V Settings UI under Live Migrations. With these configured, live migration is enabled. Test a live migration: Create a test VM (or if you have VMs, pick one) and attempt to move it from one node to another using Failover Cluster Manager or PowerShell: In Failover Cluster Manager, under Roles, right-click a virtual machine, choose Live Migrate > Select Node… and pick another node. The VM should migrate with zero downtime. If it fails, check for error messages regarding authentication. Ensure you initiated the move from a node where you’re an admin (or via cluster manager connected to the cluster with appropriate credentials). The cluster will handle the mutual auth using the certificates (this is transparent – behind the scenes, the nodes use the self-created PKU2U cert or our installed certs to establish a secure connection for VM memory transfer). Alternatively, use PowerShell: Move-ClusterVirtualMachineRole -Name "<VM resource name>" -Node <TargetNode> This cmdlet triggers a cluster-coordinated live migration (the cluster’s Move operation will use the appropriate auth). If the migration succeeds, congratulations – you have a fully functional Hyper-V cluster without AD! Security Best Practices Recap and Additional Hardening Additional best practices for securing a workgroup Hyper-V cluster include: Certificate Security: The private keys of your node certificates are powerful – protect them. They are stored in the machine store (and likely marked non-exportable). Only admins can access them; ensure no unauthorized users are in the local Administrators group. Plan a process for certificate renewal before expiration. If using an enterprise CA, you might issue certificates with a template that allows auto-renewal via scripts or at least track their expiry to re-issue and install new certs on each node in time. The Failover Cluster service auto-generates its own certificates (for CLIUSR/PKU2U) and auto-renews them, but since we provided our own, we must manage those. Stagger renewals to avoid all nodes swapping at once (the cluster should still trust old vs new if the CA is the same). It may be wise to overlap: install new certs on all nodes and only then remove the old, so that at no point a node is presenting a cert the others don't accept (if you change CA or template). Trusted Root and Revocation: All nodes trust the CA – maintain the security of that CA. Do not include unnecessary trust (e.g., avoid having nodes trust public CAs that they don’t need). If possible, use an internal CA that is only used for these infrastructure certs. Keep CRLs (Certificate Revocation Lists) accessible if your cluster nodes need to check revocation for each other’s certs (though cluster auth might not strictly require online revocation checking if the certificates are directly trusted). It’s another reason to have a reasonably long-lived internal CA or offline root. Disable NTLM: Since clustering no longer needs NTLM as of Windows 2019+, you can consider disabling NTLM fallback on these servers entirely for added security (via Group Policy “Network Security: Restrict NTLM: Deny on this server” etc.). However, be cautious: some processes (including cluster formation in older versions, or other services) might break. In our configuration, cluster communications should use Kerberos or cert. If these servers have no need for NTLM (no legacy apps), disabling it eliminates a whole class of attacks. Monitor event logs (Security log events for NTLM usage) if you attempt this. The conversation in the Microsoft tech community indicates by WS2022, cluster should function with NTLM disabled, though a user observed issues when CLIUSR password rotated if NTLM was blocked. WS2025 should further reduce any NTLM dependency. PKU2U policy: The cluster uses the PKU2U security provider for peer authentication with certificates. There is a local security policy “Network security: Allow PKU2U authentication requests to this computer to use online identities” – this must be enabled (which it is by default) for clustering to function properly. Some security guides recommend disabling PKU2U; do not disable it on cluster nodes (or if your organization’s baseline GPO disables it, create an exception for these servers). Disabling PKU2U will break the certificate-based node authentication and cause cluster communication failures. Firewall: We opened WinRM over 5986. Ensure Windows Firewall has the Windows Remote Management (HTTPS-In) rule enabled. The Failover Clustering feature should have added rules for cluster heartbeats (UDP 3343, etc.) and SMB (445) if needed. Double-check that on each node the Failover Cluster group of firewall rules is enabled for the relevant profiles (if your network is Public, you might need to enable the rules for Public profile manually, or set network as Private). Also, for live migration, if using SMB transport, enable SMB-in rules. If you enabled SMB encryption, it uses the same port 445 but encrypts payloads. Secure Live Migration Network: Ideally, the network carrying live migration is isolated (not routed outside of the cluster environment). If you want belt-and-suspenders security, you could implement IPsec encryption on live migration traffic. For example, require IPsec (with certificates) between the cluster nodes on the LM subnet. However, this can be complex and might conflict with SMB Direct/RDMA. Another simpler approach: since we can rely on our certificate mutual auth to prevent unauthorized node communication, focus on isolating that traffic so even if someone tapped it, you can optionally turn on SMB encryption for LM (when using SMB transport) which will encrypt the VM memory stream. At minimum, treat the LM network as sensitive, as it carries VM memory contents in clear text if not otherwise encrypted. Secure WinRM/management access: We configured WinRM for HTTPS. Make sure to limit who can log in via WinRM. By default, members of the Administrators group have access. Do not add unnecessary users to Administrators. You can also use Local Group Policy to restrict WinRM service to only allow certain users or certificate mappings. Since this is a workgroup, there’s no central AD group; you might create a local group for “Remote Management Users” and configure WSMan to allow members of that group (and only put specific admin accounts in it). Also consider enabling PowerShell Just Enough Administration (JEA) if you want to delegate specific tasks without full admin rights, though that’s advanced. Hyper-V host security: Apply standard Hyper-V best practices: enable Secure Boot for Gen2 VMs, keep the host OS minimal (consider using Windows Server Core for fewer attack surface, if feasible), and ensure only trusted administrators can create or manage VMs. Since this cluster is not in a domain, you won’t have AD group-based access control; consider using Authentication Policies like LAPS for unique local admin passwords per node. Monitor cluster events: Monitor the System event log for any cluster-related errors (clustering will log events if authentication fails or if there are connectivity issues). Also monitor the FailoverClustering event log channel. Any errors about “unable to authenticate” or “No logon servers” etc., would indicate certificate or connectivity problems. Test failover and failback: After configuration, test that VMs can failover properly. Shut down one node and ensure VMs move to other node automatically. When the node comes back, you can live migrate them back. This will give confidence that the cluster’s certificate-based auth holds up under real failover conditions. Consider Management Tools: Tools like Windows Admin Center (WAC) can manage Hyper-V clusters. WAC can be configured to use the certificate for connecting to the nodes (it will prompt to trust the certificate if self-signed). Using WAC or Failover Cluster Manager with our setup might require launching the console from a machine that trusts the cluster’s cert and using the cluster DNS name. Always ensure management traffic is also encrypted (WAC uses HTTPS and our WinRM is HTTPS so it is).
OrinThomas
Jul 09, 2025 Place ITOps Talk Blog
3.1KViews
3likes
4Comments
GPU Partitioning in Windows Server 2025 Hyper-V
GPU Partitioning (GPU-P) is a feature in Windows Server 2025 Hyper-V that allows multiple virtual machines to share a single physical GPU by dividing it into isolated fractions. Each VM is allocated a dedicated portion of the GPU’s resources (memory, compute, encoders, etc.) instead of using the entire GPU. This is achieved via Single-Root I/O Virtualization (SR-IOV), which provides a hardware-enforced isolation between GPU partitions, ensuring each VM can access only its assigned GPU fraction with predictable performance and security. In contrast, GPU Passthrough (also known as Discrete Device Assignment, DDA) assigns a whole physical GPU exclusively to one VM. With DDA, the VM gets full control of the GPU, but no other VMs can use that GPU simultaneously. GPU-P’s ability to time-slice or partition the GPU allows higher utilization and VM density for graphics or compute workloads, whereas DDA offers maximum performance for a single VM at the cost of flexibility. GPU-P is ideal when you want to share a GPU among multiple VMs, such as for VDI desktops or AI inference tasks that only need a portion of a GPU’s power. DDA (passthrough) is preferred when a workload needs the full GPU (e.g. large model training) or when the GPU doesn’t support partitioning. Another major difference is mobility: GPU-P supports live VM mobility and failover clustering, meaning a VM using a GPU partition can move or restart on another host with minimal downtime. DDA-backed VMs cannot live-migrate. If you need to move a DDA VM, it must be powered off and then started on a target host (in clustering, a DDA VM will be restarted on a node with an available GPU upon failover, since live migration isn’t supported). Additionally, you cannot mix modes on the same device. A physical GPU can be either partitioned for GPU-P or passed through via DDA, but not both simultaneously. Supported GPU Hardware and Driver Requirements GPU Partitioning in Windows Server 2025 is supported on select GPU hardware that provides SR-IOV or similar virtualization capabilities, along with appropriate drivers. Only specific GPUs support GPU-P and you won’t be able to configure it on a consumer gaming GPU like your RTX 5090. In addition to the GPU itself, certain platform features are required: Modern CPU with IOMMU: The host processors must support Intel VT-d or AMD-Vi with DMA remapping (IOMMU). This is crucial for mapping device memory securely between host and VMs. Older processors lacking these enhancements may not fully support live migration of GPU partitions. BIOS Settings: Ensure that in each host’s UEFI/BIOS, Intel VT-d/AMD-Vi and SR-IOV are enabled. These options may be under virtualization or PCIe settings. Without SR-IOV enabled at the firmware level, the OS will not recognize the GPU as partitionable (in Windows Admin Center it might show status “Paravirtualization” indicating the driver is capable but the platform isn’t). Host GPU Drivers: Use vendor-provided drivers that support GPU virtualization. For NVIDIA, this means installing the NVIDIA virtual GPU (vGPU) driver on the Windows Server 2025 host (the driver package that supports GPU-P). Check the GPU vendor’s documentation for installation for specifics. After installing, you can verify the GPU’s status via PowerShell or WAC. Guest VM Drivers: The guest VMs also need appropriate GPU drivers installed (within the VM’s OS) to make use of the virtual GPU. For instance, if using Windows 11 or Windows Server 2025 as a guest, install the GPU driver inside the VM (often the same data-center driver or a guest-compatible subset from the vGPU package) so that the GPU is usable for DirectX/OpenGL or CUDA in that VM. Linux guests (Ubuntu 18.04/20.04/22.04 are supported) likewise need the Linux driver installed. Guest OS support for GPU-P in WS2025 covers Windows 10/11, Windows Server 2019+, and certain Ubuntu LTS versions. After hardware setup and driver installation, it’s important to verify that the host recognizes the GPU as “partitionable.” You can use Windows Admin Center or PowerShell for this: in WAC’s GPU tab, check the “Assigned status” of the GPU it should show “Partitioned” if everything is configured correctly (if it shows “Ready for DDA assignment” then the partitioning driver isn’t active, and if “Not assignable” then the GPU/driver doesn’t support either method). In PowerShell, you can run: Get-VMHostPartitionableGpu | FL Name, ValidPartitionCounts, PartitionCount This will list each GPU device’s identifier and what partition counts it supports. For example, an NVIDIA A40 might return ValidPartitionCounts : {16, 8, 4, 2 …} indicating the GPU can be split into 2, 4, 8, or 16 partitions, and also show the current PartitionCount setting (by default it may equal the max or current configured value). If no GPUs are listed, or the list is empty, the GPU is not recognized as partitionable (check drivers/BIOS). If the GPU is listed but ValidPartitionCounts is blank or shows only “1,” then it may not support SR-IOV and can only be used via DDA. Enabling and Configuring GPU Partitioning Once the hardware and drivers are ready, enabling GPU Partitioning involves configuring how the GPU will be divided and ensuring all Hyper-V hosts (especially in a cluster) have a consistent setup. Each physical GPU must be configured with a partition count (how many partitions to create on that GPU). You cannot define an arbitrary number – it must be one of the supported counts reported by the hardware/driver. The default might be the maximum supported (e.g., 16). To set a specific partition count, use PowerShell on each host: Decide on a partition count that suits your workloads. Fewer partitions means each VM gets more GPU resources (more VRAM and compute per partition), whereas more partitions means you can assign the GPU to more VMs concurrently (each getting a smaller slice). For AI/ML, you might choose a moderate number – e.g. split a 24 GB GPU into 4 partitions of ~6 GB each for inference tasks. Run the Set-VMHostPartitionableGpu cmdlet. Provide the GPU’s device ID (from the Name field of the earlier Get-VMHostPartitionableGpu output) and the desired -PartitionCount. For example: Set-VMHostPartitionableGpu -Name "<GPU-device-ID>" -PartitionCount 4 This would configure the GPU to be divided into 4 partitions. Repeat this for each GPU device if the host has multiple GPUs (or specify -Name accordingly for each). Verify the setting by running: Get-VMHostPartitionableGpu | FL Name,PartitionCount It should now show the PartitionCount set to your chosen value (e.g., PartitionCount : 4 for each listed GPU). If you are in a clustered environment, apply the same partition count on every host in the cluster for all identical GPUs. Consistency is critical: a VM using a “quarter GPU” partition can only fail over to another host that also has its GPU split into quarters. Windows Admin Center will actually enforce this by warning you if you try to set mismatched counts on different nodes. You can also configure the partition count via the WAC GUI. In WAC’s GPU partitions tool, select the GPU (or a set of homogeneous GPUs across hosts) and choose Configure partition count. WAC will present a dropdown of valid partition counts (as reported by the GPU). Selecting a number will show a tooltip of how much VRAM each partition would have (e.g., selecting 8 partitions on a 16 GB card might show ~2 GB per partition). WAC helps ensure you apply the change to all similar GPUs in the cluster together. After applying, it will update the partition count on each host automatically. After this step, the physical GPUs on the host (or cluster) are partitioned into the configured number of virtual GPUs. They are now ready to be assigned to VMs. The host’s perspective will show each partition as a shareable resource. (Note: You cannot assign more partitions to VMs than the number configured) Assigning GPU Partitions to Virtual Machines With the GPU partitioned at the host level, the next step is to attach a GPU partition to a VM. This is analogous to plugging a virtual GPU device into the VM. Each VM can have at most one GPU partition device attached, so choose the VM that needs GPU acceleration and assign one partition to it. There are two main ways to do this: using PowerShell commands or using the Windows Admin Center UI. Below are the instructions for each method. To add the GPU Partition to the VM use the Add-VMGpuPartitionAdapter cmdlet to attach a partitioned GPU to the VM. For example: Add-VMGpuPartitionAdapter -VMName "<VMName>" This will allocate one of the available GPU partitions on the host to the specified VM. (There is no parameter to specify which partition or GPU & Hyper-V will auto-select an available partition from a compatible GPU. If no partition is free or the host GPUs aren’t partitioned, this cmdlet will return an error) You can check that the VM has a GPU partition attached by running: Get-VMGpuPartitionAdapter -VMName "<VMName>" | FL InstancePath,PartitionId This will show details like the GPU device instance path and a PartitionId for the VM’s GPU device. If you see an entry with an instance path (matching the GPU’s PCI ID) and a PartitionId, the partition is successfully attached. Power on the VM. On boot, the VM’s OS will detect a new display adapter. In Windows guests, you should see a GPU in Device Manager (it may appear as a GPU with a specific model, or a virtual GPU device name). Install the appropriate GPU driver inside the VM if not already installed, so that the VM can fully utilize the GPU (for example, install NVIDIA drivers in the guest to get CUDA, DirectX, etc. working). Once the driver is active in the guest, the VM will be able to leverage the GPU partition for AI/ML computations or graphics rendering. Using Windows Admin Center: Open Windows Admin Center and navigate to your Hyper-V cluster or host, then go to the GPUs extension. Ensure you have added the GPUs extension v2.8.0 or later to WAC. In the GPU Partitions tab, you’ll see a list of the physical GPUs and any existing partitions. Click on “+ Assign partition”. This opens an assignment wizard. Select the VM: First choose the host server where the target VM currently resides (WAC will list all servers in the cluster). Then select the VM from that host to assign a partition to. (If a VM is greyed out in the list, it likely already has a GPU partition assigned or is incompatible.) Select Partition Size (VRAM): Choose the partition size from the dropdown. WAC will list options that correspond to the partition counts you configured. For example, if the GPU is split into 4, you might see an option like “25% of GPU (≈4 GB)” or similar. Ensure this matches the partition count you set. You cannot assign more memory than a partition contains. Offline Action (HA option): If the VM is clustered and you want it to be highly available, check the option for “Configure offline action to force shutdown” (if presented in the UI). Proceed to assign. WAC will automatically: shut down the VM (if it was running), attach a GPU partition to it, and then power the VM back on. After a brief moment, the VM should come online with the GPU partition attached. In the WAC GPU partitions list, you will now see an entry showing the VM name under the GPU partition it’s using. At this point, the VM is running with a virtual GPU. You can repeat the process for other VMs, up to the number of partitions available. Each physical GPU can only support a fixed number of active partitions equal to the PartitionCount set. If you attempt to assign more VMs than partitions, the additional VMs will not get a GPU (or the Add command will fail). Also note that a given VM can only occupy one partition on one GPU – you cannot span a single VM across multiple GPU partitions or across multiple GPUs with GPU-P. GPU Partitioning in Clustered Environments (Failover Clustering) One of the major benefits introduced with Windows Server 2025 is that GPU partitions can be used in Failover Clustering scenarios for high availability. This means you can have a Hyper-V cluster where VMs with virtual GPUs are clustered roles, capable of moving between hosts either through live migration (planned) or failover (unplanned). To utilize GPU-P in a cluster, you must pay special attention to configuration consistency and understand the current limitations: Use Windows Server 2025 Datacenter: As mentioned, clustering features (like failover) for GPU partitions are supported only on Datacenter edition. Homogeneous GPU Configuration: All hosts in the cluster should have identical GPU hardware and partitioning setup. Failover/Live Migration with GPU-P does not support mixing GPU models or partition sizes in a GPU-P cluster. Each host should have the same GPU model. The partition count configured (e.g., 4 or 8 etc.) must be the same on every host. This uniformity ensures that a VM expecting a certain size partition will find an equivalent on any other node. Windows Server 2025 introduces support for live migrating VMs that have a GPU partition attached. However, there are important caveats: Hardware support: Live migration with GPU-P requires that the hosts’ CPUs and chipsets fully support isolating DMA and device state. In practice, as noted, you need Intel VT-d or AMD-Vi enabled, and the CPUs ideally supporting “DMA bit tracking.” If this is in place, Hyper-V will attempt to live migrate the VM normally. During such a migration, the GPU’s state is not seamlessly copied like regular memory; instead, Windows will fallback to a slower migration process to preserve integrity. Specifically, when migrating a VM using GPU-P, Hyper-V automatically uses TCP/IP with compression (even if you have faster methods like RDMA configured). This is because device state transfer is more complex. The migration will still succeed, but you may notice higher CPU usage on the host and a longer migration time than usual. Cross-node compatibility: Ensure that the GPU driver versions on all hosts are the same, and that each host has an available partition for the VM. If a VM is running and you trigger a live migrate, Hyper-V will find a target where the VM can get an identical partition. If none are free, the migration will not proceed (or the VM may have to be restarted elsewhere as a failover). Failover (Unplanned Moves): If a host crashes or goes down, a clustered VM with a GPU partition will be automatically restarted on another node, much like any HA VM. The key difference is that the VM cannot save its state, so it will be a cold start on the new node, attaching to a new GPU partition there. When the VM comes up on the new node, it will request a GPU partition. Hyper-V will allocate one if available. If NodeB had no free partition (say all were assigned to other VMs), the VM might start but not get a GPU (and likely Windows would log an error that the virtual GPU could not start). Administrators should monitor and possibly leverage anti-affinity rules to avoid packing too many GPU VMs on one host if full automatic failover is required. To learn more about GPU-P on Windows Server 2025, consult the documentation on Learn: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/gpu-partitioning
OrinThomas
Jul 03, 2025 Place ITOps Talk Blog
1.6KViews
1like
0Comments
How to run a Windows 11 VM on Hyper-V
Happy new year everyone! Last month, before the holidays I wanted to run a Windows 11 VM on Hyper-V to run a few tests on Windows containers in a different environment than my local machine. However, it took me some time to get that VM up and running, simply because I forgot about the new hardware requirements for Windows 11 and that I had to get them configured before I installed the new OS in it. This blog post is my contribution so you don’t have to go through the same!
ViniciusApolinario
Jul 02, 2025 Place ITOps Talk Blog
246KViews
17likes
14Comments