edgeAI
3 TopicsEdge AI for Student Developers: Learn to Run AI Locally
AI isn’t just for the cloud anymore. With the rise of Small Language Models (SLMs) and powerful local inference tools, developers can now run intelligent applications directly on laptops, phones, and edge devices—no internet required. If you're a student developer curious about building AI that works offline, privately, and fast, Microsoft’s Edge AI for Beginners course is your perfect starting point. What Is Edge AI? Edge AI refers to running AI models directly on local hardware—like your laptop, mobile device, or embedded system—without relying on cloud servers. This approach offers: ⚡ Real-time performance 🔒 Enhanced privacy (no data leaves your device) 🌐 Offline functionality 💸 Reduced cloud costs Whether you're building a chatbot that works without Wi-Fi or optimizing AI for low-power devices, Edge AI is the future of intelligent, responsive apps. About the Course Edge AI for Beginners is a free, open-source curriculum designed to help you: Understand the fundamentals of Edge AI and local inference Explore Small Language Models like Phi-2, Mistral-7B, and Gemma Deploy models using tools like Llama.cpp, Olive, MLX, and OpenVINO Build cross-platform apps that run AI locally on Windows, macOS, Linux, and mobile The course is hosted on GitHub and includes hands-on labs, quizzes, and real-world examples. You can fork it, remix it, and contribute to the community. What You’ll Learn Module Focus 01. Introduction What is Edge AI and why it matters 02. SLMs Overview of small language models 03. Deployment Running models locally with various tools 04. Optimization Speeding up inference and reducing memory 05. Applications Building real-world Edge AI apps Each module is beginner-friendly and includes practical exercises to help you build and deploy your own local AI solutions. Who Should Join? Student developers curious about AI beyond the cloud Hackathon participants looking to build offline-capable apps Makers and builders interested in privacy-first AI Anyone who wants to explore the future of on-device intelligence No prior AI experience required just a willingness to learn and experiment. Why It Matters Edge AI is a game-changer for developers. It enables smarter, faster, and more private applications that work anywhere. By learning how to deploy AI locally, you’ll gain skills that are increasingly in demand across industries—from healthcare to robotics to consumer tech. Plus, the course is: 💯 Free and open-source 🧠 Backed by Microsoft’s best practices 🧪 Hands-on and project-based 🌐 Continuously updated Ready to Start? Head to aka.ms/edgeai-for-beginners and dive into the modules. Whether you're coding in your dorm room or presenting at your next hackathon, this course will help you build smarter AI apps that run right where you need them on the edge.106Views0likes0CommentsBringing AI to the edge: Hackathon Windows ML
AI Developer Hackathon Windows ML Hosted by Qualcomm on SnapDragonX We’re excited to announce our support and participation for the upcoming global series of Edge AI hackathons, hosted by Qualcomm Technologies. The first is on June 14-15 in Bangalore. We see a world of hybrid AI, developing rapidly as new generation of intelligent applications get built for diverse scenarios. These range from mobile, desktop, spatial computing and extending all the way to industrial and automotive. Mission critical workloads oscillate between decision-making in the moment, on device, to fine tuning models on the cloud. We believe we are in the early stages of development of agentic applications that efficiently run on the edge for scenarios needing local deployment and on-device inferencing. Microsoft Windows ML Windows ML – a cutting-edge runtime optimized for performant on-device model inference and simplified deployment, and the foundation of Windows AI Foundry. Windows ML is designed to support developers creating AI-infused applications with ease, harnessing the incredible strength of Windows’ diverse hardware ecosystem whether it’s for entry-level laptops, Copilot+ PCs or top-of-the-line AI workstations. It’s built to help developers leverage the client silicon best suited for their specific workload on any given device whether it’s an NPU for low-power and sustained inference, a GPU for raw horsepower or CPU for the broadest footprint and flexibility. Introducing Windows ML: The future of machine learning development on Windows - Windows Developer Blog Getting Started To get started, install AI Toolkit, leverage one of our conversion and optimization templates, or start building your own. Explore documentation and code samples available on Microsoft Learn, check out AI Dev Gallery (install, documentation) for demos and more samples to help you get started with Windows ML. Microsoft and Qualcomm Technologies: A strong collaboration Microsoft and Qualcomm Technologies’ collaboration bring new advanced AI features into Copilot+ PCs, leveraging the Snapdragon X Elite. Microsoft Research has played a pivotal role by optimizing new lightweight LLMs, such as Phi Silica, specifically for on-device execution with the Hexagon NPU. These models are designed to run efficiently on Hexagon NPUs, enabling multimodal AI experiences like vision-language tasks directly on Copilot+ PCs without relying on the cloud. Additionally, Microsoft has made DeepSeek R1 7B and 14B distilled models available via Azure AI Foundry, further expanding the AI ecosystem on the edge. This collaboration marks a significant step in democratizing AI by making powerful, efficient models accessible on everyday devices Windows AI Foundry expands AI capabilities by providing high-performance built-in models and supports developers' custom models with silicon performance. This developer platform plays a key role in this collaboration. Windows ML enables Windows 11 and Copilot+ PCs to use the Hexagon NPU for power efficient inference. Scaling optimization through Olive toolchain The Windows ML foundation of the Windows AI Foundry provides a unified platform for AI development across various hardware architectures and brings silicon performance using QNN Execution provider. This stack includes Windows ML and toolchains like Olive, easily accessible in AI Toolkit for VS Code, which streamlines model optimization and deployment. Qualcomm Technologies has contributed to Microsoft’s Olive, an open-source model optimization tool that enhances AI performance by optimizing models for efficient inference on client systems. This tool is particularly beneficial for running LLMs and GenAI workloads on Qualcomm Technologies’ platforms. Real-World Applications Through Qualcomm Technologies and Microsoft’s collaboration we have partnered with top developers to adopt Windows ML and have demonstrated impressive performance for their AI features. Independent Solution Vendors (ISVs) such as Powder, Topaz Labs, Camo and McAfee, Join us at the Hackathon With the recent launch of Qualcomm Snapdragon® X Elite-powered Windows laptops, developers can now take advantage of powerful NPUs (Neural Processing Units) to deploy AI applications that are both responsive and energy-efficient. These new devices open up a world of opportunities for developers to rethink how applications are built from productivity tools to creative assistants and intelligent agents all running directly on the device. Our mission has always been to enable high-quality AI experiences using compact, optimized models. These models are tailor-made for edge computing, offering faster inference, lower memory usage, and enhanced privacy without compromising performance. We encourage all application developers whether you’re building with open-source SLMs (small language models), working on smart assistants, or exploring new on-device AI use cases to join us at the event. You can register here: https://www.qualcomm.com/support/contact/forms/edge-ai-developer-hackathon-bengaluru-proposal-submission Dive deeper into these innovative developer solutions: Windows AI Foundry & Windows ML on Qualcomm NPU Microsoft and Qualcomm Technologies collaborate on Windows 11, Copilot+ PCs and Windows AI Foundry | Qualcomm Unlocking the power of Qualcomm QNN Execution Provider GPU backen Introducing Windows ML: The future of machine learning development on Windows - Windows Developer Blog649Views4likes0CommentsUsing Advanced Reasoning Model on EdgeAI Part 1 - Quantization, Conversion, Performance
DeepSeek-R1 is very popular, and it can achieve the same capabilities as OpenAI o1 in advanced reasoning. Microsoft has also added DeepSeek-R1 models to Azure AI Foundry and GitHub Models. We can compare DeepSeek-R1 ith other available models through GitHub Models Playground Note This series revolves around deployment of SLMs to Edge Devices 'Edge AI' we will focus on the deployment advanced reasoning models, with different application scenarios. You can learn more in the following session AI Tour BRK453. In this experiement we want to deploy advanced reasoning models to the edge, so that they can run on edge devices with limited computing power and offline environments. At this time, the recommendation is to use the traditional ONNX model . We can use Microsoft Olive to convert the DeepSeek-R1 Distrill model. Getting started with Microsoft Olive is very straightforward. Install the Microsoft Olive library through the command line and Python 3.10+ (recommended) pip install olive-ai The DeepSeek-R1 Distrill model series has different parameters such as 1.5B, 7B, 8B, 14B, 32B, 70B, etc. This article is mainly based on the 1.5B, 7B, and 14B models (so a Small Language Model). CPU Inference Let's discuss 1.5B and 7B, which are models with lower parameter. We can directly use the CPU as computing for inference to test the effect (hardware environment Azure DevBox, AMD EPYC 7763 64-Core + 64GB Memory + 2T SSD) Quantization conversion olive auto-opt --model_name_or_path <Your DeepSeek-R1-Distill-Qwen-1.5B/7B local location> --output_path <Your Convert ONNX INT4 Model local location> --device cpu --provider CPUExecutionProvider --precision int4 --use_model_builder --log_level 1 You can download it directly from my Hugging face Repo (Note: This model is for testing and has not been fully tested by AI Content Safety or provided as an Offical Model) DeepSeek-R1-Distill-Qwen-1.5B-ONNX-INT4-CPU DeepSeek-R1-Distill-Qwen-7B-ONNX-INT4-CPU Running with ONNX Runtime GenAI Install ONNX Runtime GenAI and ONNX Runtime CPU support libraries pip install onnxruntime-genai pip install onnxruntime Sample Code https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/demo-1.5b.ipynb https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/demo-7b.ipynb Performance comparison 1.5B vs 7B We compare two different inference scenarios explain 1+1=2 1.5B quantized ONNX model memory occupied, time consumption and number of tokens generated: 7B quantized ONNX model memory occupied, time consumption and number of tokens generated 2. Find all pairwise different isomorphism groups with order 147 and no elements with order 49 1.5B quantized ONNX model memory occupied, time consumption and number of tokens generated: 7B quantized ONNX model memory occupied, time consumption and number of tokens generated Results of the numbers Through the test, we can see that the 1.5B model of DeepSeek is more suitable for use on CPU inference and can be deployed on traditional PCs or IoT devices. As for 7B, although it has better inference, it is not very effective on CPU operation. GPU Inference It is ideal if we have a GPU on the edge device. We can quantize and convert it to an ONNX model for CPU inference through Microsoft Olive. Of course, it can also be converted to a model for GPU inference. Here I take the 14B DeepSeek-R1-Distill-Qwen-14B as an example and make an inference comparison with Microsoft's Phi-4-14B Quantization conversion olive auto-opt --model_name_or_path <Your Phi-4-14B or DeepSeek-R1-Distill-Qwen-14B local path > --output_path <Your converted Phi-4-14B or DeepSeek-R1-Distill-Qwen-14B local path > --device gpu --provider CUDAExecutionProvider --precision int4 --use_model_builder --log_level 1 You can download it directly from my Hugging face Repo (Note: This model is for testing and has not been fully tested by AI Content Safety and not an Official Model) DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Phi-4-14B-ONNX-INT4-GPU Running with ONNX Runtime GenAI CUDA Install ONNX Runtime GenAI and ONNX Runtime GPU support libraries pip install onnxruntime-genai-cuda pip install onnxruntime-gpu Compare the results in the GPU environment with Gradio It is recommended to use a GPU with more than 8G memory To increase the comparison of the results, we compare it with Phi-4-14B-ONNX-INT4-GPU and DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU to see the different results. We also show we use OpenAI o1-mini (it is recommended to use o1-mini through GitHub Models), Sample Code https://github.com/kinfey/EdgeAIForAdvancedReasoning/blob/main/notebook/Performance_AdvancedReasoning_ONNX_CPU.ipynb You can test any prompt on Gradio to compare the results of Phi-4-14B-ONNX-INT4-GPU, DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU and OpenAI o1 mini. DeepSeek-R1 reduces the cost of inference models and produces more instructive results on professional problems, but Phi-4-14B also has advantages in reasoning and uses lower computing power to complete inference. As for OpenAI o1 mini, it is more comprehensive and can touch all problems. If you want to deploy to Edge Device, Phi-4-14B and quantized DeepSeek-R1 are good choices for you. This blog is just a simple test and the first in this series. Please share your feedback and continue the discussion in the Microsoft AI Discord Channel. Feel free to me a message or comment. We look forward to sharing more around the opportunity of EdgeAI and more content in this series. Resource DeepSeek-R1 in GitHub Models https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1 DeepSeek-R1 in Azure AI Foundry https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek Phi-4-14B in Hugging face https://huggingface.co/microsoft/phi-4 Learn about Microsoft Olive https://github.com/microsoft/olive Learn about ONNX Runtime GenAI https://github.com/microsoft/onnxruntime-genai Microsoft AI Discord Channel BRK453 Exploring cutting-edge models: LLMs, SLMs, local development and more https://aka.ms/aitour/brk453937Views0likes0Comments