Surface Laptop
17 TopicsLeveraging the power of NPU to run Gen AI tasks on Copilot+ PCs
Thanks to their massive scale and impressive technical evolution, large language models (LLMs) have become the public face of Generative AI innovation. However, bigger isn’t always better. While LLMs like the ones behind Microsoft Copilot are incredibly capable at a wide range of tasks, less-discussed small language models (SLMs) expand the utility of Gen AI for real-time and edge applications. SLMs can run efficiently on a local device with low power consumption and fast performance, enabling new scenarios and cost models. SLMs can run on universally available chips like CPUs and GPUs, but their potential really comes alive running on Neural Processing Units (NPUs), such as the ones found in Microsoft Surface Copilot+ PCs. NPUs are specifically designed for processing machine learning workloads, leading to high performance per watt and thermal efficiency compared to CPUs or GPUs [1]. SLMs and NPUs together support running quite powerful Gen AI workloads efficiently on a laptop, even when running on battery power or multitasking. In this blog, we focus on running SLMs on Snapdragon® X Plus processors on the recently launched Surface Laptop 13-inch, using the Qualcomm® AI Hub, leading to efficient local inference, increased hardware utilization and minimal setup complexity. This is only one of many methods available - before diving into this specific use case, let’s first provide an overview of the possibilities for deploying small language models on Copilot+ PC NPUs. Qualcomm AI Engine Direct (QNN) SDK: This process requires converting SLMs into QNN binaries that can be executed through the NPU. The Qualcomm AI Hub provides a convenient way to compile any PyTorch, TensorFlow, or ONNX-converted models into QNN binaries executable by the Qualcomm AI Engine Direct SDK. Various precompiled models are directly available in the Qualcomm AI Hub, their collection of over 175 pre-optimized models, ready for download and integration into your application. ONNX Runtime: ONNX Runtime is an open-source inference engine from Microsoft designed to run models in the ONNX format. The QNN Execution Provider (EP) by Qualcomm Technologies optimizes inference on Snapdragon processors using AI acceleration hardware, mainly for mobile and embedded use. ONNX Runtime Gen AI is a specialized version optimized for generative AI tasks, including transformer-based models, aiming for high-performance inference in applications like large language models. Although ONNX Runtime with QNN EP can run models on Copilot+ PCs, some operator support is missing for Gen AI workloads. ONNX Runtime Gen AI is not yet publicly available for NPU; a private beta is currently out with an unclear ETA on public release at the time of releasing this blog. Here is the link to the Git repo for more info on upcoming releases microsoft/onnxruntime-genai: Generative AI extensions for onnxruntime Windows AI Foundry: Windows AI Foundry provides AI-supported features and APIs for Copilot+ PCs. It includes pre-built models such as Phi-Silica that can be inferred using Windows AI APIs. Additionally, it offers the capability to download models from the cloud for local inference on the device using Foundry Local. This feature is still in preview. You can learn more about Windows AI Foundry here: Windows AI Foundry | Microsoft Developer AI Toolkit for VS Code: The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models from the Azure AI Foundry catalog and other catalogs like Hugging Face. This platform allows users to download multiple models either from the cloud or locally. It currently houses several models optimized to run on CPU, with support for NPU-based models forthcoming, starting with Deepseek R1. Comparison between different approaches Feature Qualcomm AI Hub ONNX Runtime (ORT) Windows AI Foundry AI Toolkit for VS code Availability of Models Wide set of AI models (vision, Gen AI, object detection, and audio). Any models can be integrated. NPU support for Gen AI tasks and ONNX Gen AI Runtime are not yet generally available. Phi Silica model is available through Windows AI APIs, additional AI models from cloud can be downloaded for local inference using Foundry Local Access to models from sources such as Azure AI Foundry and Hugging Face. Currently only supports Deepseek R1 and Phi 4 Mini models for NPU inference. Ease of development The API is user-friendly once the initial setup and end-to-end replication are complete. Simple setup, developer-friendly; however, limited support for custom operators means not all models deploy through ORT. Easiest framework to adopt—developers familiar with Windows App SDK face no learning curve. Intuitive interface for testing models via prompt-response, enabling quick experimentation and performance validation. Is processor or SoC independent No. Supports Qualcomm Technologies processors only. Models must be compiled and optimized for the specific SOC on the device. A list of supported chipsets is provided, and the resulting .bin files are SOC-specific. Limitations exist with QNN EP’s HTP backend: only quantized models and those with static shapes are currently supported. Yes. The tool can operate independently of SoC. It is part of the broader Windows Copilot Runtime framework, now rebranded as the Windows AI Foundry. Model-dependent. Easily deployable on-device; model download and inference are straightforward. As of writing this article and based on our team's research, we found Qualcomm AI Hub to be the most user-friendly and well-supported solution available at this time. In contrast, most other frameworks are still under development and not yet generally available. Before we dive into how to use Qualcomm AI Hub to run Small Language Models (SLMs), let’s first understand what Qualcomm AI Hub is. What is Qualcomm AI Hub? Qualcomm AI Hub is a platform designed to simplify the deployment of AI models for vision, audio, speech, and text applications on edge devices. It allows users to upload, optimize, and validate their models for specific target hardware—such as CPU, GPU, or NPU—within minutes. Models developed in PyTorch or ONNX are automatically converted for efficient on-device execution using frameworks like TensorFlow Lite, ONNX Runtime, or Qualcomm AI Engine Direct. The Qualcomm AI Hub offers access to a collection of over 100 pre-optimized models, with open-source deployment recipes available on GitHub and Hugging Face. Users can also test and profile these models on real devices with Snapdragon and Qualcomm platforms hosted in the cloud. In this blog we will be showing how you can use Qualcomm AI Hub to get a QNN context binary for models and use Qualcomm AI Engine to run those context binaries. The context binary is a SoC-specific deployment mechanism. When compiled for a device, it is expected that the model will be deployed to the same device. The format is operating system agnostic so the same model can be deployed on Android, Linux, or Windows. The context binary is designed only for the NPU. For more details on how to compile models in other formats, please visit the documentation here Overview of Qualcomm AI Hub — qai-hub documentation. The following case study details the efficient execution of the Phi-3.5 model using optimized, hardware-specific binaries on a Surface Laptop 13-inch powered by the Qualcomm Snapdragon X Plus processor, Hexagon™ NPU, and Qualcomm Al Hub. Microsoft Surface Engineering Case Study: Running Phi-3.5 Model Locally on Snapdragon X Plus on Surface Laptop 13-inch This case study details how the Phi-3.5 model was deployed on a Surface Laptop 13-inch powered by the Snapdragon X Plus processor. The study was developed and documented by the Surface DASH team, which specializes in delivering AI/ML solutions to Surface devices and generating data-driven insights through advanced telemetry. Using Qualcomm AI Hub, we obtained precompiled QNN context binaries tailored to the target SoC, enabling efficient local inference. This method maximizes hardware utilization and minimizes setup complexity. We used a Surface Laptop 13-inch with the Snapdragon X Plus processor as our test device. The steps below apply to the Snapdragon X Plus processor; however, the process remains similar for other Snapdragon X Series processors and devices as well. For the other processors, you may need to download different model variants of the desired models from Qualcomm AI Hub. Before you begin to follow along, please check the make and models of your NPU by navigating to Device Manager --> Neural Processors. We also used Visual Studio Code and Python (3.10.3.11, 3.12). We used the 3.11 version to run these steps below and recommend using the same, although there should be no difference in using a higher Python version. Before starting, let's create a new virtual environment in Python as a best practice. Follow the steps to create a new virtual environment here: https://code.visualstudio.com/docs/python/environments?from=20423#_creating-environments Create a folder named ‘genie_bundle’ store config and bin files. Download the QNN context binaries specific to your NPU and place the config files into the genie_bundle folder. Copy the .dll files from QNN SDK into the genie_bundle folder. Finally, execute the test prompt through genie-sdk in the required format for Phi-3.5. Setup steps in details Step 1: Setup local development environment Download QNN SDK: Go to the Qualcomm Software Center Qualcomm Neural Processing SDK | Qualcomm Developer and download the QNN SDK by clicking on Get Software (by default latest version of SDK gets downloaded). For the purpose of this demo, we used latest version available (2.34) . You may need to make an account on the Qualcomm website to access it. Step 2: Download QNN Context Binaries from Qualcomm AI Hub Models Download Binaries: Download the context binaries (.bin files) for the Phi-3.5-mini-instruct model from (Link to Download Phi-3.5 context binaries). Clone AI Hub Apps repo: Use the Genie SDK (Generative Runtime built on top of Qualcomm AI Direct Engine), and leverage the sample provided in https://github.com/quic/ai-hub-apps Setup folder structure to follow along the code: Create a folder named "genie_bundle" outside of the folder where AI Hub Apps repo was cloned. Selectively copy configuration files from AI Hub sample repo to 'genie_bundle' Step 3: Copy config files and edit files Copy config files to genie_bundle folder from ai-hub-apps. You will need two config files. You can use the PowerShell script below to copy the config files from repo to local genie folder created in previous steps. You also need to copy HTP backend config file as well as the genie config file from the repo # Define the source paths $sourceFile1 = "ai-hub-apps/tutorials/llm_on_genie/configs/htp/htp_backend_ext_config.json.template" $sourceFile2 = "ai-hub-apps/tutorials/llm_on_genie/configs/genie/phi_3_5_mini_instruct.json" # Define the local folder path $localFolder = "genie_bundle" # Define the destination file paths using the local folder $destinationFile1 = Join-Path -Path $localFolder -ChildPath "htp_backend_ext_config.json" $destinationFile2 = Join-Path -Path $localFolder -ChildPath "genie_config.json" # Create the local folder if it doesn't exist if (-not (Test-Path -Path $localFolder)) { New-Item -ItemType Directory -Path $localFolder } # Copy the files to the local folder Copy-Item -Path $sourceFile1 -Destination $destinationFile1 -Force Copy-Item -Path $sourceFile2 -Destination $destinationFile2 -Force Write-Host "Files have been successfully copied to the genie_bundle folder with updated names." After copying the files, you will need to make sure to change the default values of the parameters provided with template files copied. Edit HTP backend file in the newly pasted location - Change dsp_arch and soc_model to match with your configuration pdate soc model and dsp arch in HTP backend config files Edit genie_config file to include the downloaded binaries for Phi 3 models in previous steps Step 4: Download the tokenizer file from Hugging Face Visit the Hugging Face Website: Open your web browser and go to https://huggingface.co/microsoft/Phi-3.5-mini-instruct/tree/main Locate the Tokenizer File: On the Hugging Face page, find the tokenizer file for the Phi-3.5-mini-instruct model Download the File: Click on the download button to save the tokenizer file to your computer Save the File: Navigate to your genie_bundle folder and save the downloaded tokenizer file there. Note: There is an issue with the tokenizer.json file for the Phi 3.5 mini instruct model, where the output does not break words using spaces. To resolve this, you need to delete lines #192-197 in the tokenizer.json file. Download tokenizer files from the hugging face repo (Image Source - Hugging Face) Step 5: Copy files from QNN SDK Locate the QNN SDK Folder: Open the folder where you have installed the QNN SDK in step 1 and identify the required files. You need to copy the files from the below mentioned folder. Exact folder naming may change based on SDK version <QNN-SDK ROOT FOLDER>/qairt/2.34.0.250424/lib/hexagon-v75/unsigned <QNN-SDK ROOT FOLDER> /qairt/2.34.0.250424/lib/aarch64-windows-msvc <QNN-SDK ROOT FOLDER> /qairt/2.34.0.250424/bin/aarch64-windows-msvc Navigate to your genie_bundle folder and paste the copied files there. Step 6: Execute the Test Prompt Open Your Terminal: Navigate to your genie_bundle folder using your terminal or command prompt. Run the Command: Copy and paste the following command into your terminal: ./genie-t2t-run.exe -c genie_config.json -p "<|system|>\nYou are an assistant. Provide helpful and brief responses.\n<|user|>What is an NPU? \n<|end|>\n<|assistant|>\n" Check the Output: After running the command, you should see the response from the assistant in your terminal. This case study demonstrates the process of deploying a small language model (SLM) like Phi-3.5 on a Copilot+ PC using the Hexagon NPU and Qualcomm AI Hub. It outlines the setup steps, tooling, and configuration required for local inference using hardware-specific binaries. As deployment methods mature, this approach highlights a viable path toward efficient, scalable Gen AI execution directly on edge devices. Snapdragon® and Qualcomm® branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm, Snapdragon and Hexagon™ are trademarks or registered trademarks of Qualcomm Incorporated.2.6KViews6likes2CommentsNew Surface Laptop 5G for Business, Copilot+ PC
Stay securely connected with rearchitected 5G design — including six smart-switching antennas, eSIM and Wi-Fi 7 — without relying on hotspots. As the first Surface Laptop to feature 5G, it enables enterprise-ready AI features for deeper insights, productivity boosts, and powerful local inferencing wherever work happens. Stay connected anywhere. The first Surface laptop with built-in 5G — supporting NanoSIM, eSIM, smart signal switching, and international roaming. See it here. High-performance AI experiences. Surface Laptop 5G is powered by Intel Core Ultra processors with AI Boost. Watch here. No IT setup required. Surface Laptop 5G can arrive business-ready with zero-touch deployment and managed 5G policies. Check it out. QUICK LINKS: 00:00 — Surface Laptop 5G for Business 00:28 — Built-in 5G 01:30 — Hardware 02:06 — Intel® Core™ Ultra 02:41 — Built-in open-source AI models 03:20 — Management controls for IT 03:52 — Enterprise-Grade Security 04:16 — Wrap up Link References Check out https://surface.com/business Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: -Surface Laptop 5G for Business with Intel Core Ultra Series 2 processors brings together intelligent connectivity, ultra-fast performance, and premium design. It’s built for AI as a Copilot+ PC to deliver new, connected, on-device, and hybrid experiences, all while keeping your business data protected with enterprise-grade security. -Now, not all 5G laptops are created equal. Surface Laptop 5G supports both physical Nano and eSIM for flexibility to connect from anywhere. In fact, we’ve rearchitected Surface Laptop to optimize connectivity, while still maintaining the sleek and lightweight design. It’s been engineered from the ground up for optimal signal strength with a strategically-placed six-antenna array, along with a newly developed custom composite palm rest. This material and antenna placement helps ensure superior signal transparency so it’s not blocked by your desk or your legs. The signal dynamically switches between antennas based on how you interact with the device to ensure the strongest possible connection. -For example, whether you’re typing or using the touch pad, the 5G signal is routed to the least obstructed antennas. And as you move between spaces or locations during your day, you don’t need to worry about staying connected. There’s no need to connect to hotspots or untrusted networks because it’s designed to seamlessly transition between 5G and known Wi-Fi networks, and includes support for Wi-Fi 7. -Now, continuing our hardware tour, on the right side, you’ll see a Surface Connect port. This is positioned next to the removable NanoSIM tray. Then on the left side of the device, there are two USB-C Thunderbolt ports, a USB-A and a 3.5mm headphone port. Moving up the device, Surface Laptop 5G comes with a signature 3:2 aspect ratio, 13.8" PixelSense touch display. The screen is anti-reflective, and not only does it come with Dolby Vision IQ support, but it has an impressive dynamic refresh rate of up to 120Hz. -Next, let’s move on to what powers the Surface Laptop 5G for Business. This is the first Surface Copilot+ PC to support Intel with 5G and it comes with a choice of Intel Core Ultra 5 and 7 processors. It supports up to 32GB of memory on package and has integrated Intel Arc Graphics. The Intel AI Boost Neural Processing Unit, or NPU, is capable of running up to 48 TOPS without compromising battery life. And the device comes with up to one terabyte of M.2 Gen 4 SSD storage. All of this makes it optimized to run connected Copilot experiences, like powerful reasoning agents capable of generating deep insights with your work data, as well as the on-device foundational models from Windows AI Foundry. This includes 40 plus local and ready-to-use open source models like Phi Silica for text generation, built-in OCR for text recognition in images, super resolution to upscale images and video, image segmentation for background removal, and more. Your productivity experiences and factor-enhanced with AI, including improved Windows search, which combines keyword and vector-based search for more relevant results. -Next, let’s look at the enterprise-grade management controls for IT. Here, Microsoft Intune can be used to provision 5G connectivity with your network policies from the first time Surface Laptop connects to the internet, which helps ensure that only known and trusted networks can be connected to. Together with Windows Autopilot deployment, Surface Laptop 5G can be shipped directly to your workforce with your defined security policies and apps so that they’re business-ready before connecting to your managed resources. Surface Laptop 5G meets the Secured-core PC standard with a Microsoft Pluton security processor. Additionally, authentication with Windows Hello facial recognition benefits from enhanced sign-in security using virtualization. This is all part of Microsoft’s end-to-end, chip-to-cloud security that helps keep your information, devices, and users safe wherever they work from. -So that was a quick tour of how the new Surface Laptop 5G for Business was thoughtfully engineered to bring together intelligent connectivity, ultra-fast performance, and premium design. Check out surface.com/business for availability and more information. Thanks for watching.784Views2likes0CommentsOverheating problems with Surfaces
Since the temperatures have risen, we already have four devices(Surface Laptop & Pro) that are constantly throttling to the point of crashing due to overheating. Surface Support only allows us to replace the entire device, so I'll try it here. Are you also experiencing these problems? Did Microsoft provide a bad firmware update in this case?39Views0likes0CommentsTrouble Activating Ultimate Performance Plan on My Surface Laptop
Hi everyone, I’m trying to enable the Ultimate Performance power plan on my Surface device. I’ve already used the command powercfg -duplicatescheme e9a42b02-d5df-448d-aa00-03f14749eb61, and it shows up in my system. However, when I try to activate it using powercfg /setactive [GUID], I get an “Invalid Parameters” error—even though the GUID matches exactly what was listed under powercfg /list. My current active plan is “High performance” (custom-made). Has anyone else encountered this issue on a Surface laptop? Is this plan blocked or incompatible with certain Surface models, or am I missing a step? Would appreciate any insights or tips! Thank you so much for your attention and participation.43Views0likes0CommentsDisplay problems with Surface 1872
Hi, I hope someone can help me here. I was given a Surface 1872 with a broken screen to fix. The picture was viewable but yes, the screen needed replacing. I cleaned out the broken screen and bought a brand new one (that was a mistake that I won't make again !!), I then attached the new screen in place of the broken one (not in the lid, just laying above it) and attached it using the 4 cables. The laptop booted up, but there was nothing showing on the screen. I attached a USBC cable and plugged it onto my Monitor's HDMI port and that screen was fine. I then, by accident, touched the laptop screen and bizarrely, the touchscreen was working, just only visible on my monitor, not the new laptop screen. Googling this told me that it was probably a faulty screen. I then bought a second hand replacement screen, apparently tested and fully working, but as part of the whole top, i.e. in and attached to the lid. I replaced the laptop screen again and this time there was still nothing on it, but my monitor showed some flickering & some lines. I therefore disconnected the screen re-attached all of the cable again and rebooted. I hear the laptop boot up, but now I am getting nothing on either screen !!! What on earth could have caused this ? Why, at least, do I not have video on my Monitor when I did before ?!? I have tried CTRL-Win Key-Shift-B & numerous other suggestions but nothing has so far made a difference 😢 ... Any help or advice would be a Godsend !!!92Views0likes2CommentsSurface Laptop 5 abysmally horrible battery life
I bought this laptop 2 years ago for $1.4k. I expect it to have a battery life that lasts more than 3.75 hours. The charger it came with is also incredibly finnicky and this webcam makes everything look like it was recorded on a D-list smart fridge. The only think this crapshoot is good for is the aesthetic. Functionally it's useless. I really expected better from a reputed company like Microsoft.71Views0likes0Comments