Blog Post

Windows OS Platform Blog
10 MIN READ

Hotpatching on Windows

Mehmet_Iyigun's avatar
Mehmet_Iyigun
Icon for Microsoft rankMicrosoft
Nov 20, 2021

Introduction

A core priority of the Windows Kernel team is to keep the operating system, applications, and users secure. Like many operating systems, Windows has a large codebase, a driver ecosystem, and a complex set of dependencies. Every day, many malicious actors attempt to find vulnerabilities. To fix these vulnerabilities, Microsoft has historically combined a group of security fixes into what is known as a security patch.

 

Updates on Windows

Traditionally, security patches have been deployed on the second Tuesday of every month, known as Patch Tuesday. These patches are developed by feature teams as a fix for various security vulnerabilities in the OS. By providing these security patches, we aim to make the Windows OS more secure and eliminate the opportunity of malicious actors to exploit vulnerabilities. Within each patch, both user mode (application) and kernel mode (system) binaries can be updated, and typically this requires a reboot.

 

Some scenarios require continuous or near-continuous availability. For example, the instances of Windows Server that power the Azure fleet are required to be highly available. However, we also require these operating system instances to be secure. While technologies like Kernel Soft Reboot and VM preserving host updates already exist to minimize VM downtime while changing major OS releases, security patches are applied frequently enough that even this technique impacts downtime.

 

Why do updates require rebooting?

Usually, many binaries from all over the system are accessed and changed when a patch is applied. The reason a reboot is almost always required is because a binary that must be updated is usually actively mapped in one or more processes so its code may be currently executing. Certain kernel and user-mode binaries, like win32k.sys or ntdll.dll, are always loaded into memory and some others, like Explorer.exe, are loaded when there is an active user session. When binaries such as these are patched as part of an update, a restart is required for the patch to be successfully installed. When an update targets the NT kernel or additional core components, a restart is always required because it is not possible to unload those binaries while their code is executing. Traditionally, even if one fix within the entire patch required a reboot, and all other patches didn’t require a reboot, the machine would still be required to reboot to successfully install the patch.

 

Current security issues with delayed patching

Security patches are intended to be applied to the Windows OS as soon as they are released from Microsoft. Often, users and system administrators will delay the installation of a patch because of the reboot that is frequently required upon completing the installation. This delay in patching, while seemingly convenient, is actually a security issue. The FireEye Mandiant Threat Intelligence report shows that in 2018 and 2019 the exploitation of 42% of vulnerabilities occurred after a patch was already released. Furthermore, internal MSRC data shows that in the year 2020, around 75% of public proof-of-concept vulnerability were exploited after a patch has been already released. By limiting or eliminating the time between when a patch is issued to when it is applied, there is a substantial opportunity to reduce the total number of exploited vulnerabilities.

 

What is Hotpatching?

Hotpatching is the capability of an Operating system to “on-the-fly” modify some code that may be currently executed by another entity (application or driver). The hotpatching process should be invisible to the application, library or driver that is executing the code. This implies that the hotpatch engine must respect some constraints, which will be explained later in this post. Hotpatching allows the OS to install security patches without requiring a reboot, ensuring a level of increased security without sacrificing the availability of the machine. By utilizing techniques in the Windows Kernel, updates can be applied without a direct impact to the user. In Server scenarios, hotpatching allows administrators to update their guest VMs without the need of rebooting the VMs, leading to reduced downtime. Hotpatching is one of the first techniques geared to bringing users a reboot-less security update future.

 

While hotpatch is a new feature for our customers, it has been in use in Azure Host OS for a while. Internal Azure administrators have been providing rebootless security updates to Azure Host machines for long enough to collect data and improve hotpatching itself. Hotpaching is a battle-tested method of updating binaries on a system without the need to reboot.

 

The Hotpatch architecture

Hotpatch is implemented in various parts of the NT kernel, Secure Kernel and Ntdll module. Before peeking at the engine’s architecture, we should explain how the system is able to dynamically patch a binary.

 

Hotpatching works at the function level, which means that functions are individually patched and not individual files or components. Function level hotpatching works by redirecting all invocations of an un-patched function belonging to a base image to a patched function belonging to a hotpatch image. Many types of binaries can be patched using this technique, including usermode executables (EXEs and DLLs), system drivers, and even the Hypervisor and Secure Kernel binaries. Note that hotpatch images are considered cumulative, which means that each hotpatch image includes the changes from all other previous hotpatch images targeting the same base image. Multiple hotpatch images can be applied to the same base image and can be rolled back in a similar manner. The latest version of Hotpatch supports both x64 and ARM64 architectures, including 32-bit code running under WOW64.

 

Patch images, shown in Figure 1, are standard PE (Portable Executable) images, but they contain special information. In particular, the Hotpatch Table (indexed by the Image load configuration directory) contains all the information that describes the patch image, like the expected engine version, the size of the patch table, patch sequence number, and an array of compatible base image descriptors.

 

Figure 1. Hotpatch image format.

 

 

Each patch image is designed for a specific base image. The compatible base image is identified through a checksum and a time-date stamp. The patch engine will refuse to apply the patch if the base image does not have the same checksum and time-date stamp of any descriptors. In this case the patch will be added to an internal list and applied only when the correct base image is loaded later (this procedure is called “Deferred application”.)

 

The operations that are performed by the engine for applying a patch are described by an array of hotpatch descriptors. A hotpatch descriptor tells the engine what type of patch each record specifies (function patch, global symbol patch, indirect call, CFG call target and so on...). It is composed of a header and one or more hotpatch records. Each record specifies the patch’s parameters that depend on the type of the descriptor, like the source and target function’s RVA, and the original opcodes bytes.

 

The Hotpatch engine

The Hotpatch engine is implemented in various parts of the operating system, mostly in the NT and Secure kernel. The engine, as introduced in the previous paragraph, supports different kinds of images: Hypervisor, Secure Kernel and its modules, NT Kernel drivers and User-mode processes. The hotpatch engine requires the Secure Kernel to be running.

 

For applying a patch to an image, the NT kernel takes several steps that start in the MiLoadHotPatch internal function, which temporarily maps the patch image in the system address space and performs the initial analysis with the goal to search and verify the hotpatch information contained in the PE data structures (showed in Figure 1). After the checksum and timestamp of the target image for which the patch has been designed are located, the NT kernel determines whether the corresponding base image is loaded in the system (the base image can also be a secure image, like the Hypervisor or the Secure Kernel, so this step also needs to invoke the secure kernel).

When a compatible image is detected, the NT kernel begins to apply the patch to the target base image using a procedure that is a bit different depending on the type of the base image (user-mode library or process, kernel driver or a secure image). In general, the hotpatch engine maps the patch image in the same address space as the base image (as showed in Figure 2): for user-mode patches, the patch image will be mapped in each process that has the base image loaded.

 

Note that the hotpatch engine also supports session drivers. A session driver is a driver that lives in a kernel-mode address space that is tied to the user logon session (note that the session address space is generated by one particular root page table entry, which is switched on demand by the Memory manager depending on the active session). This means that a particular session can have a driver mapped which does not exist in another session. The Hotpatch engine is able to attach to all sessions in the system thanks to the “HotPatch” process created in phase 1 of the NT Kernel initialization. This minimal process has the characteristic to not belong to any session. The hotpatch engine can thus use that process to temporarily attach to any session in the system and perform the patch application only to the sessions where the driver is currently loaded.

 

Figure 2. Various address spaces supported by hotpatching on Windows.

 

Once the hotpatch image is mapped, the patch engine within the kernel starts to apply the patch by performing Backward patch application as described by the hotpatch records:

  • Patches all callees of patched functions in the patch image to jump to the corresponding functions in the base image. The reason for this is to ensure that all the unpatched code executes from the original base image. For example, if function A calls function B in the original base image and the patch image patches function A but not function B, then the patch engine will update function B in the patch image to jump to function B in the original base image.
  • Patches the necessary references to global variables in hotpatch functions to point to the corresponding global variables in the original base image.
  • Patches the necessary import address table (IAT) references in the hotpatch image by copying the corresponding IAT entries from the original base image.

It then performs the Forward patch application by patching the necessary functions in the original base image to jump to the corresponding functions in the patch image. Once this is done for any given function in the original base image, all new invocations of that function will execute the new patched function code from the hotpatch image. Once the hotpatched function returns, it will return to the caller of the original function.

 

The described procedure, which, for kernel drivers, is executed by the Secure Kernel, has been highly simplified. Note that the hotpatching process requires proper synchronization: no processor should be able to execute original instructions while undergoing a patch application. Note that the Secure Kernel is able also to interact with Hyperguard. This allows protected Patchguard images to be correctly patched.

 

The Hotpatch Address Table (HPAT)

When applying a patch to a function, the Hotpatch engine should be able to store the trampoline needed for transferring the code execution from the base to the patched function. The trampoline can’t be stored in the old un-patched function for various reasons: currently running code may hit invalid instructions and there is also no guarantee that enough space exists in the old function’s code. Furthermore, the patch engine supports both the application and the revert (undo) of a patch, which means that the original replaced bytes would have to be stored somewhere. Trampoline code to transfer execution to the target function is placed in the Hotpatch Address table code page (abbreviated as HPAT).

 

When the system initially boots, the Windows loader determines the size of the HPAT area, which is composed of a combination of data and code pages (to support ARM64 and scenarios where Retpoline is enabled on x64). When HotPatch is enabled, each boot driver is loaded in memory by reserving the HPAT pages at the end of PE image (before the Retpoline code page. Further information about Retpoline on Windows are available here: Mitigating Spectre variant 2 with Retpoline on Windows - Microsoft Tech Community). Note that the term “reserved” means that no actual physical memory is consumed. This is handled similarly for user-mode binaries.

 

When a patch is applied to a base image, the HPAT pages for both the base and the patch images are mapped to valid physical pages. When a function is patched for the first time, the patch engine allocates an HPAT entry for it and fills the code and data slot with the trampoline code and the target address. Subsequent patches for a function only update the target address. Only a single instruction is replaced in the prologue of the original function’s code. The overwritten opcode is saved in the Undo table to be replaced if the patch is reverted. Figure 3 summarizes this process:

 

Figure 3. Code flow for a hotpatched function.

 

Windows Server 2022 - New Hotpatch features

The upcoming Windows Server 2022 release includes the following improvements which make hotpatching applicable to a wider set of changes:

  • Patch images can now import new functions from other binaries.
  • Hotpatch engine now support ARM64 as well.
  • The patch engine now supports a patch callback, exported in the patch image through the “__PatchMainCallout__” function. The callback allows the patch image to perform initialization steps (like allocating memory, initializing new globals and so on....) after one or both the phases of the patch application (described previously) completed.
  • HotPatch is compatible with Retpoline. A new Retpoline dispatch function (internally called “__guard_retpoline_jump_hpat”) is invoked from the HPAT code entry and can safely transfer the code execution to the target patch function without being vulnerable to Spectre v2 side channel attacks.

 

Conclusion

Hotpatch is a powerful feature used by the Azure Fleet and Windows Server Azure Edition to eliminate downtime when applying security patches or even adding small features to the OS. Although some limitations in the functions being patched still exist (for example function signatures can never be changed), most of them has been addressed in the new version of the Engine.

 

How can you get access to the hotpatch feature?

Hotpatch-based security updates are available to customers running Windows Server 2019 and Windows Server 2022 Azure Edition images in the Azure cloud within the automanage framework. Documentation is provided on this page. We are working on bringing hotpatch-based security updates to a wider set of Windows customers.

 

Andrea Allievi & Hotpatch Team.

 

Updated Nov 20, 2021
Version 2.0
  • When the OS is finally rebooted, then the standard patch to the on-disk binary is applied, and the hotpatch is no longer used  (till then next hotpatch is needed for that binary)

  • That is not entirely true (or at least needs additional clarification). Let's take the Windows Server scenario where security updates are delivered via hotpatches and coldpatches, targeting N reboots per year. Let's say N = 2 and we're aiming to reboot in Jan and July. Once a machine receives its Jan security update (via coldpatch) and reboots, subsequent monthly security updates (until July) will be delivered via hotpatch. Take ntoskrnl.exe. The Feb hotpatch for ntoskrnl.exe applies to the "base binary" (coldpatch) delivered in Jan. The March hotpatch for ntoskrnl.exe includes the fixes in the Feb hotpatch (it's cumulative) and also applies to the same base binary, and so on. If this machine happens to reboot in April e.g. after receiving the April hotpatch package, it will load up the base binary from Janary and apply the April hotpatch to it. So, we always load the base binary and apply the "current" hotpatch to it (if one exists). When the July security update arrives, the machine will be rebooted and the July version of all binaries will be the new base binaries to which subsequent hotpatches will apply. So, if the machine reboots before receiving the August hotpatch package, it will not apply any hotpatches. 

     

    This scheme is important to control the test matrix. Each hotpatch is cumulative and applies to only one base binary. 

     

    I don't know the full technical details of 0patch, but I don't see how it can reliably work given the defences we have implemented in Windows against unauthorized code patching in kernel-mode (like PatchGuard and especially HVCI in recent versions of Windows).

     

  • Ivan Dretvic's avatar
    Ivan Dretvic
    Copper Contributor

    Thanks Steph for the clarity as its important to note that it will be replaced by the official patch at time of reboot. It looks like an integrated approach to patching similar to what these guys do: https://0patch.com

  • robhearn's avatar
    robhearn
    Copper Contributor

    According to 0patch's FAQ, they don't patch the kernel for the specific reason you mention...

    (source: Can you micropatch all types of vulnerabilities? – Help Center (zendesk.com))

     

    We currently cannot directly* micropatch:

    • Vulnerabilities in Windows kernel; this is due to the fact that Windows have an active self-defense mechanism for preventing kernel code modification called PatchGuard (formally named Kernel Patch Protection), which our micropatches would trigger and cause the computer to reboot. We believe PatchGuard adds negligible security value and there are known ways of disabling/bypassing it, but we have for now decided not to do that. It is possible that we'll revisit this decision in the future and advise users to disable PatchGuard (e.g., for end-of-support Windows 7 and Windows Server 2008 R2 servers with 0patch installed) in order to apply kernel micropatches, should these become necessary.
    • Vulnerabilities in .NET code; .NET applications are not compiled to native machine code but to managed code, which gets interpreted by .NET Framework's Common Language Runtime upon execution. We can, however, micropatch .NET Framework's native executables.
    • Vulnerabilities in Java code; while we can micropatch the Java Runtime Interpreter (e.g., java.exe), we can't micropatch Java applications (typically delivered as class files in a JAR archive) because they, too, are not in native machine code.
    • Vulnerabilities in scripted code; it would be relatively simple, technically, to modify scripted code in memory right before it got executed, but in contrast to binary files, which practically never get manually modified, scripted code is often modified by users or admins - and that makes it hard to identify a vulnerable version and pinpoint the vulnerability within. This is why we decided not to support micropatching of scripted code.
  • Dear Mehmet_Iyigun thanks for your great blog explaining this cool tech. I hope it is coming to Windows Clients anytime soon. Do you have insights on the naming of the hotpatch category name in the update catalog?

  • Omer_Sha's avatar
    Omer_Sha
    Copper Contributor

    Hi Mehmet_Iyigun ,

    Thank you for this article. It becomes very relevant with MS intention to enable hotpatching by default on Windows 11 24H2 & Server 2025.

    A couple of quick questions if possible:

    1. Hotpatching is available since Windows Server 2K3 Days . What are the main differences between newly introduced hotpatching and back then?

    2. We'd like to run some tests on a hotpatched system, to perform early validation of our SW correctness. What would be the best approach to setup such a test environment?

    Is there a way to get "real-patched" NTDLL that demonstartes coldpatch and hotpatch side-by-side for testing purposes?