Mitigating Spectre variant 2 with Retpoline on Windows

Mehmet_Iyigun · ‎Dec 05 2018

Updated May 14, 2019: We're happy to announce that today we've updated Retpoline cloud configuration to enable it for all supported devices!* In addition, with the May 14 Patch Tuesday update, we've removed the dependence on cloud configuration such that even those customers who may not be receiving cloud configuration updates can experience Retpoline performance gains.

*Note: Retpoline is enabled by default on devices running Windows 10, version 1809 and Windows Server 2019 or newer and which meet the following conditions:

Spectre, Variant 2 (CVE-2017-5715) mitigation is enabled.
- For Client SKUs, Spectre Variant 2 mitigation is enabled by default
- For Server SKUs, Spectre Variant 2 mitigation is disabled by default. To realize the benefits of Retpoline, IT Admins can enable it on servers following this guidance.
Supported microcode/firmware updates are applied to the machine.

Updated March 1, 2019: The post below outlines the performance benefits of using Retpoline against the Spectre variant 2 (CVE-2017-5715) attack—as observed with 64-bit Windows Insider Preview Builds 18272 and later. ~~While Retpoline is currently disabled by default on production Windows 10 client devices~~, we have backported the OS modifications needed to support Retpoline so that it can be used with Windows 10, version 1809 and have those modifications in the March 1, 2019 update (KB4482887).

~~Over the coming months, we will enable Retpoline as part of phased rollout via cloud configuration.~~ Due to the complexity of the implementation and changes involved, we are only enabling Retpoline performance benefits for Windows 10, version 1809 and later releases.

Updated March 5, 2019: ~~While the phased rollout is in progress, customers who would like to manually enable Retpoline on their machines can do so with the following registry configuration updates:~~

On Client SKUs:

reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 0x400
reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 0x400
Reboot

On Server SKUs:

reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 0x400
reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 0x401
Reboot

Note: The above registry configurations are for customers running with default mitigation settings. In particular, for Server SKUs, these settings will enable Spectre variant 2 mitigations (which are enabled by default on Client SKUs). If it's desirable to enable additional security mitigations on top of Retpoline, then the feature settings values for those features need to be bitwise OR'd into FeatureSettingsOverride and FeatureSettingsOverrideMask.

Example: Feature settings values for enabling SSBD (speculative store bypass) system wide:
FeatureSettingsOverride = 0x8 and FeatureSettingsOverrideMask = 0
To add Retpoline, feature settings value for Retpoline (0x400) should be bitwise OR'd:
FeatureSettingsOverride = 0x408 and FeatureSettings OverrideMask = 0x400

Get-SpeculationControlSettings PowerShell cmdlet can be used to verify Retpoline status. Here’s an example output showing Retpoline and import optimization enabled:

Speculation control settings for CVE-2017-5715 [branch target injection] 
 
Hardware support for branch target injection mitigation is present: True  
Windows OS support for branch target injection mitigation is present: True 
Windows OS support for branch target injection mitigation is enabled: True 
… 
BTIKernelRetpolineEnabled           : True 
BTIKernelImportOptimizationEnabled  : True 
...

Since Retpoline is a performance optimization for Spectre Variant 2, it requires that hardware and OS support for branch target injection to be present and enabled. Skylake and later generations of Intel processors are not compatible with Retpoline, so only Import Optimization will be enabled on these processors.

In January 2018, Microsoft released an advisory and security updates related to a newly discovered class of hardware vulnerabilities involving speculative execution side channels (known as Spectre and Meltdown) that affect AMD, ARM, and Intel CPUs to varying degrees. If you haven’t had a chance to learn about these issues, we recommend watching The Case of Spectre and Meltdown by the team at TU Graz from BlueHat Israel, reading the blog post by Jann Horn (@tehjh) of Google Project Zero.

We have also had multiple posts detailing the internals of our implementation to handle these side-channel attacks.

For today’s post, we have kernel developers Andrea Allievi and Chris Kleynhans describing our design and implementation of retpoline for Windows which improves performance of Spectre variant 2 mitigations (CVE-2017-5715) to noise-level for most scenarios. These improvements are available today in Windows Insider Builds (builds 18272 or newer, x64-only).

Introduction

At a high level, the Spectre variant 2 attack exploits indirect branches to steal secrets located in higher privilege contexts (e.g. kernel-mode vs user-mode). Indirect branches are instructions where the target of the branch is not contained in the instruction itself, such as when the destination address is stored in a CPU register.

Describing the full Spectre attack is outside the scope of this article. Details are in the links above or in this whitepaper from Intel.

Our original mitigations for Spectre variant 2 made use of new capabilities exposed by CPU microcode updates to restrict indirect branch speculation when executing within kernel mode (IBRS and IBPB). While this was an effective mitigation from a security standpoint, it resulted in a larger performance degradation than we’d like on certain processors and workloads.

For this reason, starting in early 2018, we investigated alternatives and found promise in an approach developed by Google called retpoline. A full description of retpoline can be found here, but in short, retpoline works by replacing all indirect call or jumps in kernel-mode binaries with an indirect branch sequence that has safe speculation behavior.

This sequence, shown below in Figure 1, effects a safe control transfer to the target address by performing a function call, modifying the return address and then returning.

RP0:  call RP2                 ; push address of RP1 onto the stack and jump to RP2
RP1:  int 3                    ; breakpoint to capture speculation
RP2:  mov [rsp], <Jump Target> ; overwrite return address on the stack to desired target
RP3:  ret                      ; return

While this construct is not as fast as a regular indirect call or jump, it has the side effect of preventing the processor from unsafe speculative execution. This proves to be much faster than running all of kernel mode code with branch speculation restricted (IBRS set to 1). However, this construct is only safe to use on processors where the RET instruction does not speculate based on the contents of the indirect branch predictor. Those processors are all AMD processors as well as Intel processors codenamed Broadwell and earlier according to Intel’s whitepaper. Retpoline is not applicable to Skylake and later processors from Intel.

Windows requirements for Retpoline

Traditionally the transformation of indirect calls and jumps into retpolines is performed when a binary is built by the compiler. However, there are several functional requirements in Windows that make a purely compile-time implementation insufficient.

These key requirements are:

Single binary: Windows releases are long-lived and must support a wide variety of hardware with a single set of binaries. On some hardware retpoline is not a complete mitigation because of alternate behavior of the ret instruction and retpoline must not be used. Further, future hardware may eliminate the need for retpoline entirely. Therefore, a Windows implementation of retpoline must allow the feature to be enabled and disabled at boot time using a single set of binaries, based on whether the underlying hardware is vulnerable, compatible and whether Spectre variant 2 mitigations are enabled on the system. Further, the runtime overhead of retpoline support should be minimal when the feature is disabled.
3rd party device drivers: A lot of the code that runs in kernel mode is not part of Windows and consists of 3rd party device driver code. Traditional retpoline would only be secure if all these drivers were recompiled with a new version of the compiler. Given the breadth of Windows 3rd party driver ecosystem, it is not realistic to expect all non-inbox 3rd party drivers to be recompiled and released to customers at the same time. Therefore, a Windows implementation of retpoline must be able to support a mixed environment, providing high performance when running drivers that have been updated, but allowing for graceful fallback to hardware-based mitigations upon entering a non-retpoline driver to preserve security.
Driver portability: Windows drivers are not bound to a specific release of Windows, many drivers that are built today for Windows 10 will also support older versions of the operating system. Therefore, a Windows implementation of retpoline must ensure that drivers compiled with retpoline support can run on a version of Windows that does not support retpoline.

General Architecture

To satisfy requirement 1 and 3, we decided that binaries would ship in a non-retpolined state and then be transformed into a retpolined state by rewriting the code sequences for all indirect calls. This ensures that systems that do not use retpoline can use the binaries as compiled without needing any support for retpoline and with minimal runtime cost.

However, performing the transformation at runtime does lead to one problem. How do we know what transformations need to be applied? Disassembling and analyzing driver machine code to locate all indirect calls is not practical.

Dynamic Value Relocation Table (DVRT)

To solve this problem, we collaborated with the compiler team in Visual Studio to develop a system whereby the compiler can emit a new type of metadata into driver binaries describing each indirect call or jump in the system. This metadata takes the form of new relocation entries in the Dynamic Value Relocation Table (DVRT).

The DVRT was originally introduced back in the Windows 10 Creators Update to improve kernel address space layout randomization (KASLR). It allowed the memory manager’s page frame number (PFN) database and page table self-map to be assigned dynamic addresses at runtime. The DVRT is stored directly in the binary and contains a series of relocation entries for each symbol (i.e. address) that is to be relocated. The relocation entries are themselves arranged in a hierarchical fashion grouped first by symbol and then by containing page to allow for a compact description of all locations in the binary that reference a relocatable symbol.

At build time, the compiler keeps track of all references to these special symbols and fills out the DVRT. Then at runtime the kernel will parse the DVRT and update each symbol reference with the correct dynamically assigned address. Importantly, the kernel will skip over any DVRT entries it does not recognize (i.e. those with an unknown symbol) so adding new symbols to the DVRT does not break older versions of Windows.

These properties meant the DVRT was a perfect place to store our retpoline metadata, however the existing DVRT format needed to be extended to support retpoline.

Based on Windows requirements, we classified indirect calls/jumps into three distinct forms and each of these forms has its own type of retpoline relocation and corresponding runtime fixup.

Import calls/jumps
Switchtable jumps
Generic indirect calls/jumps

Let’s talk a little about each of these types of calls.

Import Calls/Jumps

Import calls/jumps are, as the name implies, used for calls/jumps made by a binary to functions that have been imported from another binary. When compiling with retpoline, the compiler ensures that all such calls conform to the following form:

48 FF 15 XX XX XX XX     call qword ptr [_imp_<function>]
0F 1F 44 00 00           nop

The call or jmp instruction always directly references the import address table (IAT) and has 5 bytes of additional padding (to be used by the retpoline fixup).

Switchtable Jumps

Switchtable jumps are used for jumps made to other locations within the same function and are so-named because of their usage in implementing C/C++ switch statements. When compiling with retpoline support the compiler ensures that such calls are always made through a register and take the following form:

FF D0                    jmp rax
CC CC CC                 int 3

Generic Indirect Calls/Jumps

All other indirect calls/jumps fall into the generic type. To simplify the retpoline relocation format and the corresponding fixup logic, the compiler ensures that all such indirect calls/jumps provide their target address in the RAX register. The exact format of the call/jump instruction however differs depending on whether it is protected by control flow guard (CFG).

Loading binaries at runtime

Now that we have a way to identify all the indirect calls/jumps in the binary, we need to apply the fixups.

The NT memory manager has long had infrastructure to apply fixups to binaries at runtime. This infrastructure was extended to understand retpoline relocations and their corresponding fixups.

But what exactly do these fixups look like? As mentioned earlier, the Windows implementation needs to support mixed environments in which some drivers are not compiled with retpoline support. This means that we cannot simply replace every indirect call with a retpoline sequence like the example shown in the introduction. We need to ensure that the kernel gets the opportunity to inspect the target of the call or jump so that it can apply appropriate mitigations if the target does not support retpoline.

For this reason, we transform every indirect call or jump into a direct call or jump to a kernel provided “retpoline stub function”. For example, an indirect call to an imported function that looks like this:

call qword ptr [_imp_ExAllocatePoolWithTag]     ; Target address located at a REL32 offset
nop                                             ; Padding

Will be replaced at runtime with a direct call to the retpoline import stub:

mov r10, qword ptr [_imp_ExAllocatePoolWithTag] ; R10 = target address
call _guard_retpoline_import_r10                ; Direct REL32 call to the stub function

There are several retpoline stub functions each of which is specialized to the type of call/jump it handles. However, each function generally performs the following steps:

Check if the target binary supports retpoline

Prior to transferring control to the target address, the function must determine whether the target address belongs to a driver that supports retpoline. To determine this, the kernel maintains a sparse bitmap of the entire kernel-mode address space with each bit describing a 64 KB region of the address space. Bits in this bitmap are set to 1 if and only if their corresponding region of address space belongs to a kernel-mode binary that fully supports retpoline.
If the bitmap check determines that the target address does not belong to a retpolined binary, the stub function has to fall back to the hardware-based Spectre variant 2 mitigation (by setting IBRS to restrict branch speculation) and then perform a regular indirect call/jmp. Otherwise, the kernel does not need to set IBRS. On processors that do not support IBRS, retpoline will, instead, perform IBPB if user-to-kernel protection is enabled as described here.
Since the target of a switch table jump is always in the same binary as the source (and therefore the target is guaranteed to support retpoline), this bitmap check is omitted from the switchtable jump stub functions.

Check if the target address is a valid CFG target

For CFG instrumented indirect calls/jumps the retpoline stub function is responsible for checking the kernel-mode CFG bitmap to verify that the target address given is a valid CFG call target. If this check fails, then the stub function will bugcheck the system to prevent any exploit that attempts an indirect control transfer to an invalid address.

Transfer control to the target using a retpoline.

The usage of these stub functions ensures that we can satisfy the requirement to support mixed environments, however they do introduce one additional problem. The x64 direct call/jump instruction can only encode a target address within 2 GB of the call-site (since the target is specified by a signed 16- or 32-bit offset). Since the retpoline stub functions are implemented in the NT kernel binary this would generally mean that drivers would have to be loaded within 2 GB of the kernel binary.

To work around this requirement, all retpoline stub functions are contained within a single section of the NT kernel binary and have been carefully written to take no dependencies on their position relative to the rest of the binary. This allows us to map the physical memory pages backing the retpoline stub functions immediately after every driver in the system, giving each driver its own “copy” of the retpoline stub functions that is guaranteed to be within 2 GB of every indirect call/jump.

Import optimization

Indirect calls due to imported functions are by far the most common form of indirect control transfers in kernel-mode. The import call targets are determined at driver load time by processing the import address table (IAT) and remain constant throughout the driver’s lifetime. This means that most of the work provided by the retpoline import stub is unnecessary because we know at driver load time exactly where each of these calls will end up going and we know whether the target binary supports retpoline or not. Hence, we can use a much faster calling sequence.

With import optimization, we use the retpoline fixup infrastructure to replace eligible import calls with direct calls to the imported function. This eliminates the overhead of the retpoline import call stub as well as the guaranteed branch prediction miss due to retpoline itself. To be eligible for import optimization, a call must meet the following requirements:

The call/jump must be from a retpolined binary to another retpolined binary.

This is necessary to maintain the security guarantees of retpoline because once we’ve rewritten the indirect call into a direct call the kernel no longer gets a chance to observe the target address and enable IBRS.

The target of the call must be within 2 GB of the call site.

This is because as mentioned above direct call/jump instructions on x64 can only encode a 32-bit offset.
In order to virtually guarantee that import optimization can be applied all retpolined modules, the OS loader and kernel make sure that all kernel-mode modules are packed tightly in the address space while maintaining address space layout randomizations (ASLR).

Here is an example of how the code generation for the call is modified.

Original code sequence

call [__imp_<Function>]                   ; Call to an imported function
nop                                       ; 5-byte nop

Import Optimized code sequence

mov r10, [__imp_<Function>]               ; R10 = target address (normal transformation)
call <Function>                           ; Direct REL32 call to target

Import optimization turned out to be a big performance win! Hence, even on processors where retpoline cannot be used due to alternate return instruction behavior, we still use import optimization.

Conclusion

Retpoline has significantly improved the performance of the Spectre variant 2 mitigations on Windows. When all relevant kernel-mode binaries are compiled with retpoline, we’ve measured ~25% speedup in Office app launch times and up to 1.5-2x improved throughput in the Diskspd (storage) and NTttcp (networking) benchmarks on Broadwell CPUs in our lab. It is enabled by default in the latest Windows Client Insider Fast builds (for builds 18272 and higher on machines exposing compatible speculation control capabilities) and is targeted to ship with 19H1.

To check if retpoline and import optimizations are enabled, you can use the PowerShell cmdlet Get-SpeculationControlSettings. You can also use NtQuerySystemInformation to programmatically query retpoline status.

For a more in-depth look, here is a talk by Andrea Allievi at BlueHat 2018 talking about retpoline on Windows.

Give the latest builds a try and let us know your experience!

gamanakis · ‎Dec 06 2018

Any chance we see this in 1809? If not, when do you expect 19H1 to ship?

Tantawi · ‎Dec 09 2018

Thanks. Will this be back-ported to Windows Server 2019 at some point in 2019? and if not, should we just accept to be running a none-optimized OS for a few years if we want a server OS with desktop experience installed?

Jesse Cook · ‎Mar 01 2019

Could you please explain how to tell if retpoline is enabled? You mentioned the powershell command, but I don't see anything in the results related to retpoline or it is not obvious to me which of the results relate to it.

*EDIT* Nevermind, apparently the Skip Ahead 20H1 builds do not include it yet

boktai1000 · ‎Mar 01 2019

I am wondering about Retpoline as well. I'm currently running Windows 10 1809 with March 1, 2019—KB4482887 (OS Build 17763.348) installed.

I then went and installed the latest version of Get-SpeculationControlSettings at https://support.microsoft.com/en-us/help/4074629/understanding-the-output-of-get-speculationcontrols... as per the article here https://techcommunity.microsoft.com/t5/Windows-Kernel-Internals/Mitigating-Spectre-variant-2-with-Re... which states "To check if retpoline and import optimizations are enabled, you can use the PowerShell cmdlet Get-SpeculationControlSettings. You can also use NtQuerySystemInformation to programmatically query retpoline status."

Unfortuantely I do not know how to programmatically query the retpoline status, but I would do so if I knew how. I have run the Get-SpeculationControlSettings in the past though, and it seems that expected behavior is supposed to include checking for Retpoline status. Maybe the Powershell cmdlet has not been updated to reflect this, but that isn't entirely clear.

I am running on an i7-4790K so I believe to be impacted by Retpoline and am curious to understand more.

Brok3n Cogniti0n · ‎Mar 02 2019

@boktai1000 The PS script does include Retpoline, if updated to the last version. I don't think it's Enabled yet:

"Over the coming months, we will enable Retpoline as part of phased rollout via cloud configuration."

boktai1000 · ‎Mar 02 2019

@Brok3n Cogniti0n- I do see that you're absolutely right! I completely overlooked that, so my apologies for potentially confusing people by my comment.

I do see BTIKernelRetpolineEnabled in the output, which for me listed as False - which lines up with what you're saying.

Thank you for taking the time to reply to me on this!

Brok3n Cogniti0n · ‎Mar 02 2019

@boktai1000 Same output for me. I guess we gonna have to wait and see how those Microsoft could config works.

rebootit · ‎Mar 03 2019

@Brok3n Cogniti0n @boktai1000 with 17763.348 you should have retpoline enabled, it specifically adds support to 1809.

I cannot verify as I am running 19H1 but perhaps Retpoline is only enabled with Spectre protections enabled, you should check out how to enable them on client systems (you may need KB4465065 and registry settings changes as outlined in MS ADV18002) if your machine does not have hardware microcode.

KB4465065 is a standalone update and not delivered by WU.

Brok3n Cogniti0n · ‎Mar 03 2019

@rebootit It comes disabled, even though support was added indeed. It specifically says that it will be enabled via "cloud configuration" over the next months. For the next feature update it should come as enabled by default.

groviglio · ‎Mar 05 2019

i have a intel broadwell(NUC). i am, till now, fully patched (firmware+OS(W10 1809)). powershell script outputs, smooth as silk, all "trues" (in the right places) i was waiting, with anxiety, for these new settings for retpoline: well, i applied those and...they destroy the protection for SSB(CVE-2018-3639). because retpoline and SSB keys in the registry are the same: FeatureSettingsOverride and FeatureSettingsOverrideMask these two keys are set to "8" and "3" to have all "trues" at SSB section of powershell script output if you set both keys to "400" ,as you need to cover retpoline, you disrupt protection for SSB (that is not "true" anymore in powershell script) how to solve? these are two screenshots with the output of powershell script, the first with "8" and "3" inside "FeatureSettingsOverride" and "FeatureSettingsOverrideMask" keys. the second with "400" and "400" note the variations of "true" and "false" across the sections of powershell script outputs https://imgur.com/a/fWB9Hzj

Brian . · ‎Mar 05 2019

@jessecook My 18845 has this enabled already, according to the PS module. I didn't touch the Registry. Unclear if it's all 20H1 or phased.

rebootit · ‎Mar 06 2019

@groviglio This has been answered here. If you require additional protections then bitwise add the previous values for enable SSBD, Spectre v2 and Meltdown (8), disable Spectre v2 and Meltdown (3), disable Spectre v2 only (2) and disable Meltdown only (1).

But I can confirm as I had registry settings of 8 (and a mask of 3) when upgrading to 18346 (via 18343) I have the retpoline enabled according to the PowerShell script without having to modify the registry. FWIW I have hardware uCode and a Broadwell i7-5500U.

The registry has remained unchanged but all protections and optimisations are present.

groviglio · ‎Mar 06 2019

thanks @rebootit, it all went well :)

sba94015 · ‎Mar 06 2019

Please note that the Intel whitepaper "Retpoline: A Branch Target Injection Mitigation" https://software.intel.com/security-software-guidance/api-app/sites/default/files/Retpoline-A-Branch... has been superseded by "Deep Dive: Retpoline: A Branch Target Injection Mitigation" https://software.intel.com/security-software-guidance/insights/deep-dive-retpoline-branch-target-inj...

jaffar83 · ‎Mar 06 2019

@rebootit

I'm confused, Do I need to enable SSBD? Or is it disabled by default?

I don't remember what was the original entries in the registry because I changed them to disable the original patch 8 months ago.

rebootit · ‎Mar 06 2019

Retpoline is a way of optimising variant 2 mitigation’s. SSBD is a different Spectre class vulnerability. Mitigation for SSBD is not enabled by default.

Inari Okami · ‎Mar 08 2019

It looks like Retpoline and Import Optimization are completely incompatible with Hyper-V, even when forced. Is this intentional? Will Windows 10 19H1 have the same massive limitation?

Steven H · ‎Mar 12 2019

@Inari Okami I also had this issue testing CPUs Sandybridge - Haswell so you're not alone, even March CU doesn't seem to address the issue.

However it does seem to work for the most part in 19H1, 19H2 or 20H1. It's likely they are still working on ironing out issues before releasing it to everyone.

9t9c0de · ‎Mar 14 2019

it seems retpoline and import optimizations are not compatible with virtualization based security(vbs)

Mehmet_Iyigun · ‎Mar 18 2019

In Windows 1809 (RS5/Server 2019), retpoline and import optimization are disabled if HVCI (DeviceGuard) is enabled. This restriction is eliminated in 19H1 (Spring 2019) release where retpoline/import optimization can coexist with DeviceGuard.

Inari Okami · ‎Apr 15 2019

@Mehmet_Iyigun10.0.18362.53, Import Optimization is still off whenever HVCI is on. Retpoline says it's working, though.

JonathanW984 · ‎Aug 02 2019

So, I understand that for Windows 10 1809 and Windows Server 2019 that updates will automatically enable retpoline when Spectre Variant 2 is enabled. What I can't understand through the multiple updates is whether or not retpoline can be used on Server 2016 by using the reg keys or if it's not available at all for the OS.

Laverne1415 · ‎May 17 2021