Forum Discussion

ManuelaErdmann's avatar
ManuelaErdmann
Copper Contributor
Jan 23, 2026

Proposal for Native Kernel Integration of GPU VRAM as Tiered System Memory (NUMA Implementation)

Proposal for Native Kernel Integration of GPU VRAM as Tiered System Memory (NUMA Implementation)

 

1. EXECUTIVE SUMMARY This proposal outlines a strategic extension to the Windows Memory Manager (Mm) and the Windows Display Driver Model (WDDM).

The objective is to expose unallocated Video RAM (VRAM) as a secondary, operating-system-managed memory tier. By treating the GPU as a specialized NUMA (Non-Uniform Memory Access) node via PCIe Resizable BAR, Windows can mitigate current high DDR system memory costs (2026 market conditions) and unlock significant performance gains for I/O-heavy workstation tasks.

 

2. MOTIVATION & MARKET ANALYSIS

The "Dark Silicon" Problem: Modern consumer and workstation GPUs ship with large amounts of high-bandwidth graphics memory. In workloads that do not involve active 3D rendering—such as software compilation, office productivity, or general desktop usage—most of this memory remains unused. This leads to a situation where a substantial portion of a high-performance resource sits idle while system RAM continues to be the primary bottleneck.

Economic Efficiency: With volatile prices for high-density DDR5 or DDR6 RAM, upgrading physical system memory is a significant barrier. Utilizing existing VRAM immediately improves hardware ROI (Return on Investment).

Local AI Workflows: The rise of local LLMs (Large Language Models) necessitates fast memory paging. Offloading context caches to VRAM offers bandwidth strictly superior to NVMe SSD paging, even when constrained by PCIe bus limits.

 

3. TECHNICAL FEASIBILITY & ARCHITECTURE To implement this securely and efficiently, I propose moving beyond simple "RAM disks" to a Tiered Memory approach:

 

A. VRAM as a NUMA Node (Software-Defined) Windows already supports NUMA for multi-socket servers.

Concept: The kernel driver exposes the GPU VRAM aperture (via Resizable BAR) as a distinct NUMA node.

Behavior: The Windows Memory Manager prioritizes standard DDR RAM (Node 0). When Node 0 reaches capacity, instead of paging "cold" pages to the disk/SSD, the Kernel moves them to VRAM (Node 1).

Benefit: Latency via PCIe 5.0 (approx. 64 GB/s) is orders of magnitude lower than NVMe SSD latency, keeping the system responsive even under heavy load.

 

B. The WDDM Challenge: Resource Contention A major hurdle is the conflict between Graphics/Compute workloads and Storage workloads.

Solution: Implementation of a strict "Kernel-Level Eviction Policy."

Mechanism: The OS treats "VRAM-as-RAM" pages as discardable or low-priority. As soon as a DirectX/Vulkan context requests VRAM (e.g., launching a game), the Kernel immediately flushes the VRAM-resident system pages back to the SSD (pagefile) or compresses them into system RAM.

Feasibility: WDDM 3.x already handles complex residency management. This would be a logical extension of existing paging logic.

 

4. SECURITY IMPLICATIONS (Why Open Source is Insufficient) Community-driven solutions (User-mode RAM disks) fail to meet modern security standards required for enterprise adoption:

HVCI & VBS Compatibility: Accessing physical memory addresses via PCIe BAR requires Kernel-Mode privileges. Unsigned community drivers violate Hypervisor-Enforced Code Integrity (HVCI). Only a Microsoft-signed driver ensures that the "Secured-Core PC" status remains intact.

DMA Protection: Direct Memory Access must be strictly controlled by the IOMMU to prevent peripherals from reading system data. Only a native Windows driver can correctly configure the IOMMU for this specific "VRAM-Swap" usage.

 

5. STRATEGIC ADVANTAGE FOR WINDOWS Implementing this feature positions Windows as the premier OS for high-performance workstations, bridging the gap between consumer hardware and enterprise needs by maximizing the utility of available hardware resources.

ADDENDUM: ARCHITECTURAL PATHWAYS FOR IMPLEMENTATION To further clarify the technical execution, I propose two distinct architectural pathways for the engineering team to consider. These range from a pragmatic storage driver (Path A) to a fully unified memory architecture (Path B).

 

PATH A: THE "VRAM-DRIVE" (STORAGE STACK INTEGRATION)

Target: Implementation as a virtual block device (VBD).

Complexity: Moderate.

Risk: Low.

Concept: Instead of extending the system memory pool, the driver creates a high-speed RAM disk exposed to the OS via the Storport driver model.

Technical Implementation Details:

Driver Model: A Kernel-Mode WDF (Windows Driver Framework) driver acting as a virtual Miniport.

Address Mapping: The driver requests the Physical Address of the VRAM aperture (via PCIe BAR resizing) and maps it to a logical disk volume (e.g., Z:).

Data Integrity:

To prevent data loss during GPU resets (TDR), the driver must register for WDDM power state notifications.

On "Power Down" or "Standby," the contents of the VRAM-Drive could be flushed to the NVMe system drive (Hibernation-like behavior).

Use Cases:

Developer Scratchpad: Target directory for "obj" and "bin" folders during compilation (massive I/O reduction on SSDs).

Pagefile Target: Windows allows placing pagefile.sys on secondary drives. Placing it on the VRAM-Drive creates a pseudo-RAM extension.

 

PATH B: THE "TIERED SYSTEM MEMORY" (MEMORY MANAGER INTEGRATION)

Target: Transparent extension of the physical RAM pool (NUMA/Tiering).

Complexity: High.

Risk: Moderate (Requires tight WDDM synchronization).

Concept: The VRAM is treated as a secondary memory tier managed directly by the Windows Memory Manager (Mm), invisible to user-mode applications.

Technical Implementation Details:

NUMA Node Emulation:

The driver utilizes the SRAT (System Resource Affinity Table) logic to present the GPU VRAM as a "Headless NUMA Node" (Node 1).

This allows the OS to apply standard NUMA policies (e.g., allocate on Node 0 first, spill over to Node 1).

Page Management & Migration:

Windows currently pages "cold" memory to the disk (backing store).

In this model, the Memory Manager would "migrate" pages from DDR (Node 0) to VRAM (Node 1) via DMA before considering the SSD.

Latency Hierarchy: L1/L2/L3 Cache <-> DDR RAM <-> VRAM (PCIe) <-> NVMe SSD.

The "Game Mode" Eviction Logic (Crucial):

WDDM Priority: Graphics workloads must always take precedence.

Mechanism: When a DirectX/Vulkan context requires VRAM allocation, the Kernel Driver receives a "Pre-Eviction Callback."

Action: The OS immediately marks the system pages residing in VRAM as "modified" and flushes them to the system pagefile, freeing the VRAM for the game within milliseconds.

Safety Mechanism: To prevent BSODs during GPU TDR (Timeout Detection and Recovery) events, the Memory Manager should prioritize VRAM allocation for Read-Only or Clean Pages (pages already backed by disk) whenever possible, treating VRAM as a non-guaranteed volatility tier.

COMPARATIVE ANALYSIS The primary difference between these approaches lies in user interaction and kernel integration. Path A operates within the Storage Stack, requiring the user to manually manage a new drive volume. It is highly effective for specific I/O-heavy tasks like software compilation but limited by file system overhead.

In contrast, Path B operates within the Kernel Core (Memory Manager). It offers a fully automatic experience where the user simply perceives an increase in available system memory. However, Path B relies on PCIe bus latency management and complex WDDM integration to prevent conflicts with gaming workloads.

No RepliesBe the first to reply

Resources