A core priority of the Windows Kernel team is to keep the operating system, applications, and users secure. Like many operating systems, Windows has a large codebase, a driver ecosystem, and a complex set of dependencies. Every day, many malicious actors attempt to find vulnerabilities. To fix these vulnerabilities, Microsoft has historically combined a group of security fixes into what is known as a security patch.
Traditionally, security patches have been deployed on the second Tuesday of every month, known as Patch Tuesday. These patches are developed by feature teams as a fix for various security vulnerabilities in the OS. By providing these security patches, we aim to make the Windows OS more secure and eliminate the opportunity of malicious actors to exploit vulnerabilities. Within each patch, both user mode (application) and kernel mode (system) binaries can be updated, and typically this requires a reboot.
Some scenarios require continuous or near-continuous availability. For example, the instances of Windows Server that power the Azure fleet are required to be highly available. However, we also require these operating system instances to be secure. While technologies like Kernel Soft Reboot and VM preserving host updates already exist to minimize VM downtime while changing major OS releases, security patches are applied frequently enough that even this technique impacts downtime.
Usually, many binaries from all over the system are accessed and changed when a patch is applied. The reason a reboot is almost always required is because a binary that must be updated is usually actively mapped in one or more processes so its code may be currently executing. Certain kernel and user-mode binaries, like win32k.sys or ntdll.dll, are always loaded into memory and some others, like Explorer.exe, are loaded when there is an active user session. When binaries such as these are patched as part of an update, a restart is required for the patch to be successfully installed. When an update targets the NT kernel or additional core components, a restart is always required because it is not possible to unload those binaries while their code is executing. Traditionally, even if one fix within the entire patch required a reboot, and all other patches didn’t require a reboot, the machine would still be required to reboot to successfully install the patch.
Security patches are intended to be applied to the Windows OS as soon as they are released from Microsoft. Often, users and system administrators will delay the installation of a patch because of the reboot that is frequently required upon completing the installation. This delay in patching, while seemingly convenient, is actually a security issue. The FireEye Mandiant Threat Intelligence report shows that in 2018 and 2019 the exploitation of 42% of vulnerabilities occurred after a patch was already released. Furthermore, internal MSRC data shows that in the year 2020, around 75% of public proof-of-concept vulnerability were exploited after a patch has been already released. By limiting or eliminating the time between when a patch is issued to when it is applied, there is a substantial opportunity to reduce the total number of exploited vulnerabilities.
Hotpatching is the capability of an Operating system to “on-the-fly” modify some code that may be currently executed by another entity (application or driver). The hotpatching process should be invisible to the application, library or driver that is executing the code. This implies that the hotpatch engine must respect some constraints, which will be explained later in this post. Hotpatching allows the OS to install security patches without requiring a reboot, ensuring a level of increased security without sacrificing the availability of the machine. By utilizing techniques in the Windows Kernel, updates can be applied without a direct impact to the user. In Server scenarios, hotpatching allows administrators to update their guest VMs without the need of rebooting the VMs, leading to reduced downtime. Hotpatching is one of the first techniques geared to bringing users a reboot-less security update future.
While hotpatch is a new feature for our customers, it has been in use in Azure Host OS for a while. Internal Azure administrators have been providing rebootless security updates to Azure Host machines for long enough to collect data and improve hotpatching itself. Hotpaching is a battle-tested method of updating binaries on a system without the need to reboot.
Hotpatch is implemented in various parts of the NT kernel, Secure Kernel and Ntdll module. Before peeking at the engine’s architecture, we should explain how the system is able to dynamically patch a binary.
Hotpatching works at the function level, which means that functions are individually patched and not individual files or components. Function level hotpatching works by redirecting all invocations of an un-patched function belonging to a base image to a patched function belonging to a hotpatch image. Many types of binaries can be patched using this technique, including usermode executables (EXEs and DLLs), system drivers, and even the Hypervisor and Secure Kernel binaries. Note that hotpatch images are considered cumulative, which means that each hotpatch image includes the changes from all other previous hotpatch images targeting the same base image. Multiple hotpatch images can be applied to the same base image and can be rolled back in a similar manner. The latest version of Hotpatch supports both x64 and ARM64 architectures, including 32-bit code running under WOW64.
Patch images, shown in Figure 1, are standard PE (Portable Executable) images, but they contain special information. In particular, the Hotpatch Table (indexed by the Image load configuration directory) contains all the information that describes the patch image, like the expected engine version, the size of the patch table, patch sequence number, and an array of compatible base image descriptors.
Figure 1. Hotpatch image format.
Each patch image is designed for a specific base image. The compatible base image is identified through a checksum and a time-date stamp. The patch engine will refuse to apply the patch if the base image does not have the same checksum and time-date stamp of any descriptors. In this case the patch will be added to an internal list and applied only when the correct base image is loaded later (this procedure is called “Deferred application”.)
The operations that are performed by the engine for applying a patch are described by an array of hotpatch descriptors. A hotpatch descriptor tells the engine what type of patch each record specifies (function patch, global symbol patch, indirect call, CFG call target and so on...). It is composed of a header and one or more hotpatch records. Each record specifies the patch’s parameters that depend on the type of the descriptor, like the source and target function’s RVA, and the original opcodes bytes.
The Hotpatch engine is implemented in various parts of the operating system, mostly in the NT and Secure kernel. The engine, as introduced in the previous paragraph, supports different kinds of images: Hypervisor, Secure Kernel and its modules, NT Kernel drivers and User-mode processes. The hotpatch engine requires the Secure Kernel to be running.
For applying a patch to an image, the NT kernel takes several steps that start in the MiLoadHotPatch internal function, which temporarily maps the patch image in the system address space and performs the initial analysis with the goal to search and verify the hotpatch information contained in the PE data structures (showed in Figure 1). After the checksum and timestamp of the target image for which the patch has been designed are located, the NT kernel determines whether the corresponding base image is loaded in the system (the base image can also be a secure image, like the Hypervisor or the Secure Kernel, so this step also needs to invoke the secure kernel).
When a compatible image is detected, the NT kernel begins to apply the patch to the target base image using a procedure that is a bit different depending on the type of the base image (user-mode library or process, kernel driver or a secure image). In general, the hotpatch engine maps the patch image in the same address space as the base image (as showed in Figure 2): for user-mode patches, the patch image will be mapped in each process that has the base image loaded.
Note that the hotpatch engine also supports session drivers. A session driver is a driver that lives in a kernel-mode address space that is tied to the user logon session (note that the session address space is generated by one particular root page table entry, which is switched on demand by the Memory manager depending on the active session). This means that a particular session can have a driver mapped which does not exist in another session. The Hotpatch engine is able to attach to all sessions in the system thanks to the “HotPatch” process created in phase 1 of the NT Kernel initialization. This minimal process has the characteristic to not belong to any session. The hotpatch engine can thus use that process to temporarily attach to any session in the system and perform the patch application only to the sessions where the driver is currently loaded.
Figure 2. Various address spaces supported by hotpatching on Windows.
Once the hotpatch image is mapped, the patch engine within the kernel starts to apply the patch by performing Backward patch application as described by the hotpatch records:
It then performs the Forward patch application by patching the necessary functions in the original base image to jump to the corresponding functions in the patch image. Once this is done for any given function in the original base image, all new invocations of that function will execute the new patched function code from the hotpatch image. Once the hotpatched function returns, it will return to the caller of the original function.
The described procedure, which, for kernel drivers, is executed by the Secure Kernel, has been highly simplified. Note that the hotpatching process requires proper synchronization: no processor should be able to execute original instructions while undergoing a patch application. Note that the Secure Kernel is able also to interact with Hyperguard. This allows protected Patchguard images to be correctly patched.
When applying a patch to a function, the Hotpatch engine should be able to store the trampoline needed for transferring the code execution from the base to the patched function. The trampoline can’t be stored in the old un-patched function for various reasons: currently running code may hit invalid instructions and there is also no guarantee that enough space exists in the old function’s code. Furthermore, the patch engine supports both the application and the revert (undo) of a patch, which means that the original replaced bytes would have to be stored somewhere. Trampoline code to transfer execution to the target function is placed in the Hotpatch Address table code page (abbreviated as HPAT).
When the system initially boots, the Windows loader determines the size of the HPAT area, which is composed of a combination of data and code pages (to support ARM64 and scenarios where Retpoline is enabled on x64). When HotPatch is enabled, each boot driver is loaded in memory by reserving the HPAT pages at the end of PE image (before the Retpoline code page. Further information about Retpoline on Windows are available here: Mitigating Spectre variant 2 with Retpoline on Windows - Microsoft Tech Community). Note that the term “reserved” means that no actual physical memory is consumed. This is handled similarly for user-mode binaries.
When a patch is applied to a base image, the HPAT pages for both the base and the patch images are mapped to valid physical pages. When a function is patched for the first time, the patch engine allocates an HPAT entry for it and fills the code and data slot with the trampoline code and the target address. Subsequent patches for a function only update the target address. Only a single instruction is replaced in the prologue of the original function’s code. The overwritten opcode is saved in the Undo table to be replaced if the patch is reverted. Figure 3 summarizes this process:
Figure 3. Code flow for a hotpatched function.
The upcoming Windows Server 2022 release includes the following improvements which make hotpatching applicable to a wider set of changes:
Hotpatch is a powerful feature used by the Azure Fleet and Windows Server Azure Edition to eliminate downtime when applying security patches or even adding small features to the OS. Although some limitations in the functions being patched still exist (for example function signatures can never be changed), most of them has been addressed in the new version of the Engine.
Hotpatch-based security updates are available to customers running Windows Server 2019 and Windows Server 2022 Azure Edition images in the Azure cloud within the automanage framework. Documentation is provided on this page. We are working on bringing hotpatch-based security updates to a wider set of Windows customers.
Andrea Allievi & Hotpatch Team.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.