GenAI
35 TopicsKickstart Your AI Development with the Model Context Protocol (MCP) Course
Model Context Protocol is an open standard that acts as a universal connector between AI models and the outside world. Think of MCP as “the USB-C of the AI world,” allowing AI systems to plug into APIs, databases, files, and other tools seamlessly. By adopting MCP, developers can create smarter, more useful AI applications that access up-to-date information and perform actions like a human developer would. To help developers learn this game-changing technology, Microsoft has created the “MCP for Beginners” course a free, open-source curriculum that guides you from the basics of MCP to building real-world AI integrations. Below, we’ll explore what MCP is, who this course is for, and how it empowers both beginners and intermediate developers to get started with MCP. What is MCP and Why Should Developers Care? Model Context Protocol (MCP) is a innovative framework designed to standardize interactions between AI models and client applications. In simpler terms, MCP is a communication bridge that lets your AI agent fetch live context from external sources (like APIs, documents, databases, or web services) and even take actions using tools. This means your AI apps are no longer limited to pre-trained knowledge they can dynamically retrieve data or execute commands, enabling far more powerful and context-aware behavior. Some key reasons MCP matters for developers: Seamless Integration of Tools & Data: MCP provides a unified way to connect AI to various data sources and tools, eliminating the need for ad-hoc, fragile integrations. Your AI agent can, for example, query a database or call a web API during a conversation all through a standardized protocol. Stay Up-to-Date: Because AI models can use MCP to access external information, they overcome the training data cutoff problem. They can fetch the latest facts, figures, or documents on demand, ensuring more accurate and timely responses. Industry Momentum: MCP is quickly gaining traction. Originally introduced by Microsoft and Anthropic in late 2024, it has since been adopted by major AI platforms (Replit, Sourcegraph, Hugging Face, and more) and spawned thousands of open-source connectors by early 2025. It’s an emerging standard – learning it now puts developers at the forefront of AI innovation. In short, MCP is transformative for AI development, and being proficient in it will help you build smarter AI solutions that can interact with the real world. The MCP for Beginners course is designed to make mastering this protocol accessible, with a structured learning path and hands-on examples. Introducing the MCP for Beginners Course “Model Context Protocol for Beginners” is an open-source, self-paced curriculum created by Microsoft to teach the concepts and fundamentals of MCP. Whether you’re completely new to MCP or have some experience, this course offers a comprehensive guide from the ground up. Key Features and Highlights: Structured Learning Path: The curriculum is organized as a multi-part guide (9 modules in total) that gradually builds your knowledge. It starts with the basics of MCP – What is MCP? Why does standardization matter? What are the use cases? – and then moves through core concepts, security considerations, getting started with coding, all the way to advanced topics and real-world case studies. This progression ensures you understand the “why” and “how” of MCP before tackling complex scenarios. Hands-On Coding Examples: This isn’t just theory – practical coding examples are a cornerstone of the course. You’ll find live code samples and mini-projects in multiple languages (C#, Java, JavaScript/TypeScript, and Python) for each concept. For instance, you’ll build a simple MCP-powered Calculator application as a project, exploring how to implement MCP clients and servers in your preferred language. By coding along, you cement your understanding and see MCP in action. Real-World Use Cases: The curriculum illustrates how MCP applies to real scenarios. It discusses practical use cases of MCP in AI pipelines (e.g. an AI agent pulling in documentation or database info on the fly) and includes case studies of early adopters. These examples help you connect what you learn to actual applications and solutions you might develop in your job. Broad Language Support: A unique aspect of this course is its multi-language approach – both in terms of programming and human languages. The content provides code implementations in several popular programming languages (so you can learn MCP in the context of C#, Java, Python, JavaScript, or TypeScript, as you prefer). In addition, the learning materials themselves are available in multiple human languages (English, plus translations like French, Spanish, German, Chinese, Japanese, Korean, Polish, etc.) to support learners worldwide. This inclusivity ensures that more developers can comfortably engage with the material. Up-to-Date and Open-Source: Being hosted on GitHub under MIT License, the curriculum is completely free to use and open for contributions. It’s maintained with the latest updates for example, automated workflows keep translations in sync so all language versions stay current. As MCP evolves, the course content can evolve with it. You can even join the community to suggest improvements or add content, making this a living learning resource. Official Resources & Community Support: The course links to official MCP documentation and specs for deeper reference, and it encourages learners to join thehttps;//aka.ms/ai/discord to discuss and get help. You won’t be learning alone; you can network with experts and peers, ask questions, and share progress. Microsoft’s open-source approach means you’re part of a community of practitioners from day one. Course Outline: (Modules at a Glance) Introduction to MCP: Overview of MCP, why standardization matters in AI, and the key benefits and use cases of using MCP. (Start here to understand the big picture.) Core Concepts: Deep dive into MCP’s architecture – understanding the client-server model, how requests and responses work, and the message schema. Learn the fundamental components that make up the protocol. Security in MCP: Identify potential security threats when building MCP-based systems and learn best practices to secure your AI integrations. Important for anyone planning to deploy MCP in production environments. Getting Started (Hands-On): Set up your environment and create your first MCP server and client. This module walks through basic implementation steps and shows how to integrate MCP with existing applications, so you get a service up and running that an AI agent can communicate with. MCP Calculator Project: A guided project where you build a simple MCP-powered application (a calculator) in the language of your choice. This hands-on exercise reinforces the concepts by implementing a real tool – you’ll see how an AI agent can use MCP to perform calculations via an external tool. Practical Implementation: Tips and techniques for using MCP SDKs across different languages. Covers debugging, testing, validation of MCP integrations, and how to design effective prompt workflows that leverage MCP’s capabilities. Advanced Topics: Going beyond the basics – explore multi-modal AI workflows (using MCP to handle not just text but other data types), scalability and performance tuning for MCP servers, and how MCP fits into larger enterprise architectures. This is where intermediate users can really deepen their expertise. Community Contributions: Learn how to contribute to the MCP ecosystem and the curriculum itself. This section shows you how to collaborate via GitHub, follow the project’s guidelines, and even extend the protocol with your own ideas. It underlines that MCP is a growing, community-driven standard. Insights from Early Adoption: Hear lessons learned from real-world MCP implementations. What challenges did early adopters face? What patterns and solutions worked best? Understanding these will prepare you to avoid pitfalls in your own projects. Best Practices and Case Studies: A roundup of do’s and don’ts when using MCP. This includes performance optimization techniques, designing fault-tolerant systems, and testing strategies. Plus, detailed case studies that walk through actual MCP solution architectures with diagrams and integration tips bringing everything you learned together in concrete examples. Who Should Take This Course? The MCP for Beginners course is geared towards developers if you build or work on AI-driven applications, this course is for you. The content specifically welcomes: Beginners in AI Integration: You might be a developer who's comfortable with languages like Python, C#, or Java but new to AI/LLMs or to MCP itself. This course will take you from zero knowledge of MCP to a level where you can build and deploy your own MCP-enabled services. You do not need prior experience with MCP or machine learning pipelines the introduction module will bring you up to speed on key concepts. (Basic programming skills and understanding of client-server or API concepts are the only prerequisites.) Intermediate Developers & AI Practitioners: If you have some experience building bots or AI features and want to enhance them with real-time data access, you’ll benefit greatly. The course’s later modules on advanced topics, security, and best practices are especially valuable for those looking to integrate MCP into existing projects or optimize their approach. Even if you've dabbled in MCP or a similar concept before, this curriculum will fill gaps in knowledge and provide structured insights that are hard to get from scattered documentation. AI Enthusiasts & Architects: Perhaps you’re an AI architect or tech lead exploring new frameworks for intelligent agents. This course serves as a comprehensive resource to evaluate MCP for your architecture. By walking through it, you’ll understand how MCP can fit into enterprise systems, what benefits it brings, and how to implement it in a maintainable way. It’s perfect for getting a broad yet detailed view of MCP’s capabilities before adopting it within a team. In essence, anyone interested in making AI applications more connected and powerful will find value here. From a solo hackathon coder to a professional solution architect, the material scales to your need. The course starts with fundamentals in an easy-to-grasp manner and then deepens into complex topics appealing to a wide range of skill levels. Prerequisites: The official prerequisites for the course are minimal: you should have basic knowledge of at least one programming language (C#, Java, or Python is recommended) and a general understanding of how client-server applications or APIs work. Familiarity with machine learning concepts is optional but can help. In short, if you can write simple programs and understand making API calls, you have everything you need to start learning MCP. Conclusion: Empower Your AI Projects with MCP The Model Context Protocol for Beginners course is more than just a tutorial – it’s a comprehensive journey that empowers you to build the next generation of AI applications. By demystifying MCP and equipping you with hands-on experience, this curriculum turns a seemingly complex concept into practical skills you can apply immediately. With MCP, you unlock capabilities like giving your AI agents real-time information access and the ability to use tools autonomously. That means as a developer, you can create solutions that are significantly more intelligent and useful. A chatbot that can search documents, a coding assistant that can consult APIs or run code, an AI service that seamlessly integrates with your database – all these become achievable when you know MCP. And thanks to this beginners-friendly course, you’ll be able to implement such features with confidence. Whether you are starting out in the AI development world or looking to sharpen your cutting-edge skills, the MCP for Beginners course has something for you. It condenses best practices, real-world lessons, and robust techniques into an accessible format. Learning MCP now will put you ahead of the curve, as this protocol rapidly becomes a cornerstone of AI integrations across the industry. So, are you ready to level up your AI development skills? Dive into the https://aka.ms/mcp-for-beginnerscourse and start building AI agents that can truly interact with the world around them. With the knowledge and experience gained, you’ll be prepared to create smarter, context-aware applications and be a part of the community driving AI innovation forward.6.8KViews4likes1CommentBuild AI Agents with MCP Tool Use in Minutes with AI Toolkit for VSCode
We’re excited to announce Agent Builder, the newest evolution of what was formerly known as Prompt Builder, now reimagined and supercharged for intelligent app development. This powerful tool in AI Toolkit enables you to create, iterate, and optimize agents—from prompt engineering to tool integration—all in one seamless workflow. Whether you're designing simple chat interactions or complex task-performing agents with tool access, Agent Builder simplifies the journey from idea to integration. Why Agent Builder? Agent Builder is designed to empower developers and prompt engineers to: 🚀 Generate starter prompts with natural language 🔁 Iterate and refine prompts based on model responses 🧩 Break down tasks with prompt chaining and structured outputs 🧪 Test integrations with real-time runs and tool use such as MCP servers 💻 Generate production-ready code for rapid app development And a lot of features are coming soon, stay tuned for: 📝 Use variables in prompts �� Run agent with test cases to test your agent easily 📊 Evaluate the accuracy and performance of your agent with built-in or your custom metrics ☁️ Deploy your agent to cloud Build Smart Agents with Tool Use (MCP Servers) Agents can now connect to external tools through MCP (Model Control Protocol) servers, enabling them to perform real-world actions like querying a database, accessing APIs, or executing custom logic. Connect to an Existing MCP Server To use an existing MCP server in Agent Builder: In the Tools section, select + MCP Server. Choose a connection type: Command (stdio) – run a local command that implements the MCP protocol HTTP (server-sent events) – connect to a remote server implementing the MCP protocol If the MCP server supports multiple tools, select the specific tool you want to use. Enter your prompts and click Run to test the agent's interaction with the tool. This integration allows your agents to fetch live data or trigger custom backend services as part of the conversation flow. Build and Scaffold a New MCP Server Want to create your own tool? Agent Builder helps you scaffold a new MCP server project: In the Tools section, select + MCP Server. Choose MCP server project. Select your preferred programming language: Python or TypeScript. Pick a folder to create your server project. Name your project and click Create. Agent Builder generates a scaffolded implementation of the MCP protocol that you can extend. Use the built-in VS Code debugger: Press F5 or click Debug in Agent Builder Test with prompts like: System: You are a weather forecast professional that can tell weather information based on given location. User: What is the weather in Shanghai? Agent Builder will automatically connect to your running server and show the response, making it easy to test and refine the tool-agent interaction. AI Sparks from Prototype to Production with AI Toolkit Building AI-powered applications from scratch or infusing intelligence into existing systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: 🚀SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud 🔍 Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. 🎨 Multi-Modal AI – Work with images, text, and beyond. 🤖 Agentic Frameworks – Build autonomous, decision-making AI systems. Watch on Demand Share your feedback Get started with the latest version, share your feedback, and let us know how these new features help you in your AI development journey. As always, we’re here to listen, collaborate, and grow alongside our amazing user community. Thank you for being a part of this journey—let’s build the future of AI together! Join our Microsoft Azure AI Foundry Discord channel to continue the discussion 🚀Leveraging the power of NPU to run Gen AI tasks on Copilot+ PCs
Thanks to their massive scale and impressive technical evolution, large language models (LLMs) have become the public face of Generative AI innovation. However, bigger isn’t always better. While LLMs like the ones behind Microsoft Copilot are incredibly capable at a wide range of tasks, less-discussed small language models (SLMs) expand the utility of Gen AI for real-time and edge applications. SLMs can run efficiently on a local device with low power consumption and fast performance, enabling new scenarios and cost models. SLMs can run on universally available chips like CPUs and GPUs, but their potential really comes alive running on Neural Processing Units (NPUs), such as the ones found in Microsoft Surface Copilot+ PCs. NPUs are specifically designed for processing machine learning workloads, leading to high performance per watt and thermal efficiency compared to CPUs or GPUs [1]. SLMs and NPUs together support running quite powerful Gen AI workloads efficiently on a laptop, even when running on battery power or multitasking. In this blog, we focus on running SLMs on Snapdragon® X Plus processors on the recently launched Surface Laptop 13-inch, using the Qualcomm® AI Hub, leading to efficient local inference, increased hardware utilization and minimal setup complexity. This is only one of many methods available - before diving into this specific use case, let’s first provide an overview of the possibilities for deploying small language models on Copilot+ PC NPUs. Qualcomm AI Engine Direct (QNN) SDK: This process requires converting SLMs into QNN binaries that can be executed through the NPU. The Qualcomm AI Hub provides a convenient way to compile any PyTorch, TensorFlow, or ONNX-converted models into QNN binaries executable by the Qualcomm AI Engine Direct SDK. Various precompiled models are directly available in the Qualcomm AI Hub, their collection of over 175 pre-optimized models, ready for download and integration into your application. ONNX Runtime: ONNX Runtime is an open-source inference engine from Microsoft designed to run models in the ONNX format. The QNN Execution Provider (EP) by Qualcomm Technologies optimizes inference on Snapdragon processors using AI acceleration hardware, mainly for mobile and embedded use. ONNX Runtime Gen AI is a specialized version optimized for generative AI tasks, including transformer-based models, aiming for high-performance inference in applications like large language models. Although ONNX Runtime with QNN EP can run models on Copilot+ PCs, some operator support is missing for Gen AI workloads. ONNX Runtime Gen AI is not yet publicly available for NPU; a private beta is currently out with an unclear ETA on public release at the time of releasing this blog. Here is the link to the Git repo for more info on upcoming releases microsoft/onnxruntime-genai: Generative AI extensions for onnxruntime Windows AI Foundry: Windows AI Foundry provides AI-supported features and APIs for Copilot+ PCs. It includes pre-built models such as Phi-Silica that can be inferred using Windows AI APIs. Additionally, it offers the capability to download models from the cloud for local inference on the device using Foundry Local. This feature is still in preview. You can learn more about Windows AI Foundry here: Windows AI Foundry | Microsoft Developer AI Toolkit for VS Code: The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models from the Azure AI Foundry catalog and other catalogs like Hugging Face. This platform allows users to download multiple models either from the cloud or locally. It currently houses several models optimized to run on CPU, with support for NPU-based models forthcoming, starting with Deepseek R1. Comparison between different approaches Feature Qualcomm AI Hub ONNX Runtime (ORT) Windows AI Foundry AI Toolkit for VS code Availability of Models Wide set of AI models (vision, Gen AI, object detection, and audio). Any models can be integrated. NPU support for Gen AI tasks and ONNX Gen AI Runtime are not yet generally available. Phi Silica model is available through Windows AI APIs, additional AI models from cloud can be downloaded for local inference using Foundry Local Access to models from sources such as Azure AI Foundry and Hugging Face. Currently only supports Deepseek R1 and Phi 4 Mini models for NPU inference. Ease of development The API is user-friendly once the initial setup and end-to-end replication are complete. Simple setup, developer-friendly; however, limited support for custom operators means not all models deploy through ORT. Easiest framework to adopt—developers familiar with Windows App SDK face no learning curve. Intuitive interface for testing models via prompt-response, enabling quick experimentation and performance validation. Is processor or SoC independent No. Supports Qualcomm Technologies processors only. Models must be compiled and optimized for the specific SOC on the device. A list of supported chipsets is provided, and the resulting .bin files are SOC-specific. Limitations exist with QNN EP’s HTP backend: only quantized models and those with static shapes are currently supported. Yes. The tool can operate independently of SoC. It is part of the broader Windows Copilot Runtime framework, now rebranded as the Windows AI Foundry. Model-dependent. Easily deployable on-device; model download and inference are straightforward. As of writing this article and based on our team's research, we found Qualcomm AI Hub to be the most user-friendly and well-supported solution available at this time. In contrast, most other frameworks are still under development and not yet generally available. Before we dive into how to use Qualcomm AI Hub to run Small Language Models (SLMs), let’s first understand what Qualcomm AI Hub is. What is Qualcomm AI Hub? Qualcomm AI Hub is a platform designed to simplify the deployment of AI models for vision, audio, speech, and text applications on edge devices. It allows users to upload, optimize, and validate their models for specific target hardware—such as CPU, GPU, or NPU—within minutes. Models developed in PyTorch or ONNX are automatically converted for efficient on-device execution using frameworks like TensorFlow Lite, ONNX Runtime, or Qualcomm AI Engine Direct. The Qualcomm AI Hub offers access to a collection of over 100 pre-optimized models, with open-source deployment recipes available on GitHub and Hugging Face. Users can also test and profile these models on real devices with Snapdragon and Qualcomm platforms hosted in the cloud. In this blog we will be showing how you can use Qualcomm AI Hub to get a QNN context binary for models and use Qualcomm AI Engine to run those context binaries. The context binary is a SoC-specific deployment mechanism. When compiled for a device, it is expected that the model will be deployed to the same device. The format is operating system agnostic so the same model can be deployed on Android, Linux, or Windows. The context binary is designed only for the NPU. For more details on how to compile models in other formats, please visit the documentation here Overview of Qualcomm AI Hub — qai-hub documentation. The following case study details the efficient execution of the Phi-3.5 model using optimized, hardware-specific binaries on a Surface Laptop 13-inch powered by the Qualcomm Snapdragon X Plus processor, Hexagon™ NPU, and Qualcomm Al Hub. Microsoft Surface Engineering Case Study: Running Phi-3.5 Model Locally on Snapdragon X Plus on Surface Laptop 13-inch This case study details how the Phi-3.5 model was deployed on a Surface Laptop 13-inch powered by the Snapdragon X Plus processor. The study was developed and documented by the Surface DASH team, which specializes in delivering AI/ML solutions to Surface devices and generating data-driven insights through advanced telemetry. Using Qualcomm AI Hub, we obtained precompiled QNN context binaries tailored to the target SoC, enabling efficient local inference. This method maximizes hardware utilization and minimizes setup complexity. We used a Surface Laptop 13-inch with the Snapdragon X Plus processor as our test device. The steps below apply to the Snapdragon X Plus processor; however, the process remains similar for other Snapdragon X Series processors and devices as well. For the other processors, you may need to download different model variants of the desired models from Qualcomm AI Hub. Before you begin to follow along, please check the make and models of your NPU by navigating to Device Manager --> Neural Processors. We also used Visual Studio Code and Python (3.10.3.11, 3.12). We used the 3.11 version to run these steps below and recommend using the same, although there should be no difference in using a higher Python version. Before starting, let's create a new virtual environment in Python as a best practice. Follow the steps to create a new virtual environment here: https://code.visualstudio.com/docs/python/environments?from=20423#_creating-environments Create a folder named ‘genie_bundle’ store config and bin files. Download the QNN context binaries specific to your NPU and place the config files into the genie_bundle folder. Copy the .dll files from QNN SDK into the genie_bundle folder. Finally, execute the test prompt through genie-sdk in the required format for Phi-3.5. Setup steps in details Step 1: Setup local development environment Download QNN SDK: Go to the Qualcomm Software Center Qualcomm Neural Processing SDK | Qualcomm Developer and download the QNN SDK by clicking on Get Software (by default latest version of SDK gets downloaded). For the purpose of this demo, we used latest version available (2.34) . You may need to make an account on the Qualcomm website to access it. Step 2: Download QNN Context Binaries from Qualcomm AI Hub Models Download Binaries: Download the context binaries (.bin files) for the Phi-3.5-mini-instruct model from (Link to Download Phi-3.5 context binaries). Clone AI Hub Apps repo: Use the Genie SDK (Generative Runtime built on top of Qualcomm AI Direct Engine), and leverage the sample provided in https://github.com/quic/ai-hub-apps Setup folder structure to follow along the code: Create a folder named "genie_bundle" outside of the folder where AI Hub Apps repo was cloned. Selectively copy configuration files from AI Hub sample repo to 'genie_bundle' Step 3: Copy config files and edit files Copy config files to genie_bundle folder from ai-hub-apps. You will need two config files. You can use the PowerShell script below to copy the config files from repo to local genie folder created in previous steps. You also need to copy HTP backend config file as well as the genie config file from the repo # Define the source paths $sourceFile1 = "ai-hub-apps/tutorials/llm_on_genie/configs/htp/htp_backend_ext_config.json.template" $sourceFile2 = "ai-hub-apps/tutorials/llm_on_genie/configs/genie/phi_3_5_mini_instruct.json" # Define the local folder path $localFolder = "genie_bundle" # Define the destination file paths using the local folder $destinationFile1 = Join-Path -Path $localFolder -ChildPath "htp_backend_ext_config.json" $destinationFile2 = Join-Path -Path $localFolder -ChildPath "genie_config.json" # Create the local folder if it doesn't exist if (-not (Test-Path -Path $localFolder)) { New-Item -ItemType Directory -Path $localFolder } # Copy the files to the local folder Copy-Item -Path $sourceFile1 -Destination $destinationFile1 -Force Copy-Item -Path $sourceFile2 -Destination $destinationFile2 -Force Write-Host "Files have been successfully copied to the genie_bundle folder with updated names." After copying the files, you will need to make sure to change the default values of the parameters provided with template files copied. Edit HTP backend file in the newly pasted location - Change dsp_arch and soc_model to match with your configuration pdate soc model and dsp arch in HTP backend config files Edit genie_config file to include the downloaded binaries for Phi 3 models in previous steps Step 4: Download the tokenizer file from Hugging Face Visit the Hugging Face Website: Open your web browser and go to https://huggingface.co/microsoft/Phi-3.5-mini-instruct/tree/main Locate the Tokenizer File: On the Hugging Face page, find the tokenizer file for the Phi-3.5-mini-instruct model Download the File: Click on the download button to save the tokenizer file to your computer Save the File: Navigate to your genie_bundle folder and save the downloaded tokenizer file there. Note: There is an issue with the tokenizer.json file for the Phi 3.5 mini instruct model, where the output does not break words using spaces. To resolve this, you need to delete lines #192-197 in the tokenizer.json file. Download tokenizer files from the hugging face repo (Image Source - Hugging Face) Step 5: Copy files from QNN SDK Locate the QNN SDK Folder: Open the folder where you have installed the QNN SDK in step 1 and identify the required files. You need to copy the files from the below mentioned folder. Exact folder naming may change based on SDK version <QNN-SDK ROOT FOLDER>/qairt/2.34.0.250424/lib/hexagon-v75/unsigned <QNN-SDK ROOT FOLDER> /qairt/2.34.0.250424/lib/aarch64-windows-msvc <QNN-SDK ROOT FOLDER> /qairt/2.34.0.250424/bin/aarch64-windows-msvc Navigate to your genie_bundle folder and paste the copied files there. Step 6: Execute the Test Prompt Open Your Terminal: Navigate to your genie_bundle folder using your terminal or command prompt. Run the Command: Copy and paste the following command into your terminal: ./genie-t2t-run.exe -c genie_config.json -p "<|system|>\nYou are an assistant. Provide helpful and brief responses.\n<|user|>What is an NPU? \n<|end|>\n<|assistant|>\n" Check the Output: After running the command, you should see the response from the assistant in your terminal. This case study demonstrates the process of deploying a small language model (SLM) like Phi-3.5 on a Copilot+ PC using the Hexagon NPU and Qualcomm AI Hub. It outlines the setup steps, tooling, and configuration required for local inference using hardware-specific binaries. As deployment methods mature, this approach highlights a viable path toward efficient, scalable Gen AI execution directly on edge devices. Snapdragon® and Qualcomm® branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm, Snapdragon and Hexagon™ are trademarks or registered trademarks of Qualcomm Incorporated.3.1KViews6likes2CommentsNew Generative AI Features in Azure Database for PostgreSQL
by: Maxim Lukiyanov, PhD, Principal PM Manager This week at Microsoft Build conference, we're excited to unveil a suite of new Generative AI capabilities in Azure Database for PostgreSQL flexible server. These features unlock a new class of applications powered by an intelligent database layer, expanding the horizons of what application developers can achieve. In this post, we’ll give you a brief overview of these announcements. Data is the fuel of AI. Looking back, the intelligence of Large Language Models (LLMs) can be reframed as intelligence that emerged from the vast data they were trained on. The LLMs just happened to be this technological leap necessary to extract that knowledge, but the knowledge itself was hidden in the data all along. In modern AI applications, the Retrieval-Augmented Generation (RAG) pattern applies this same principle to real-time data. RAG extracts relevant facts from data on the fly to augment an LLM’s knowledge. At Microsoft, we believe this principle will continue to transform technology. Every bit of data will be squeezed dry of every bit of knowledge it holds. And there’s no better place to find the most critical and up-to-date data than in databases. Today, we're excited to announce the next steps on our journey to make databases smarter – so they can help you capture the full potential of your data. Fast and accurate vector search with DiskANN First, we’re announcing the General Availability of DiskANN vector indexing in Azure Database for PostgreSQL. Vector search is at the heart of the RAG pattern, and it continues to be a cornerstone technology for the new generation of AI Agents - giving it contextual awareness and access to fresh knowledge hidden in data. DiskANN brings years of state-of-the-art innovation in vector indexing from Microsoft Research directly to our customers. This release introduces supports for vectors up to 16,000 dimensions — far surpassing the 2,000-dimension limit of the standard pgvector extension in PostgreSQL. This enables the development of highly accurate applications using high-dimensional embeddings. We’ve also accelerated index creation with enhanced memory management, parallel index building, and other optimizations – delivering up to 3x faster index builds while reducing disk I/O. Additionally, we're excited to announce the Public Preview of Product Quantization – a cutting-edge vector compression technique that delivers exceptional compression while maintaining high accuracy. DiskANN Product Quantization enables efficient storage of large vector volumes, making it ideal for production workloads where both performance and cost matter. With Product Quantization enabled, DiskANN offers up to 10x faster performance and 4x cost savings compared to pgvector HNSW. You can learn more about DiskANN in a dedicated blog post. Semantic operators in the database Next, we’re announcing the Public Preview of Semantic Operators in Azure Database for PostgreSQL – bringing a new intelligence layer to relational algebra, integrated directly into the SQL query engine. While vector search is foundational to the Generative AI (GenAI) apps and agents, it only scratches the surface of what’s possible. Semantic relationships between elements of the enterprise data are not visible to the vector search. This knowledge exists within the data but is lost at the lowest level of the stack – vector search – and this loss propagates upward, limiting the agent’s ability to reason about the data. This is where new Semantic Operators come in. Semantic Operators leverage LLMs to add semantic understanding of operational data. Today, we’re introducing four operators: generate() – a versatile generation operator capable of ChatGPT-style responses. is_true() – a semantic filtering operator that evaluates filter conditions and joins in natural language. extract() – a knowledge extraction operator that extracts hidden semantic relationships and other knowledge from your data, bringing a new level of intelligence to your GenAI apps and agents. rank() - a highly accurate semantic ranking operator, offering two types of state-of-the-art re-ranking models: Cohere Rank-v3.5 or OpenAI gpt-4.1 models from Azure AI Foundry Model Catalog. You can learn more about Semantic Operators in a dedicated blog post. Graph database and GraphRAG knowledge graph support Finally, we’re announcing the General Availability of GraphRAG support and the General Availability of the Apache AGE extension in Azure Database for PostgreSQL. Apache AGE extension on Azure Database for PostgreSQL offers a cost-effective, managed graph database service powered by PostgreSQL engine – and serves as the foundation for building GraphRAG applications. The semantic relationships in the data once extracted can be stored in various ways within the database. While relational tables with referential integrity can represent some relationships, this approach is suboptimal for knowledge graphs. Semantic relationships are dynamic; many aren’t known ahead of time and can’t be effectively modeled by a fixed schema. Graph databases provide a much more flexible structure, enabling knowledge graphs to be expressed naturally. Apache AGE supports openCypher, the emerging standard for querying graph data. OpenCypher offers an expressive, intuitive language well-suited for knowledge graph queries. We believe that combining semantic operators with graph support in Azure Database for PostgreSQL creates a compelling data platform for the next generation of AI agents — capable of effectively extracting, storing, and retrieving semantic relationships in your data. You can learn more about graph support in a separate blog post. Resources to help you get started We’re also happy to announce availability of the new resources and tools for application developers: Model Context Protocol (MCP) is an emerging open protocol designed to integrate AI models with external data sources and services. We have integrated MCP server for Azure Database for PostgreSQL into the Azure MCP Server, making it easy to connect your agentic apps not only to Azure Database for PostgreSQL, but to other Azure services as well through one unified interface. To learn more, refer to this blog post. New Solution Accelerator which showcases all of the capabilities we have announced today working together in one solution solving real world problems of ecommerce retail reimagined for agentic era. New PostgreSQL extension for VSCode for application developers and database administrators alike, bringing new generation of query editing and Copilot experiences to the world of PostgreSQL. And read about New enterprise features making Azure Database for PostgreSQL faster and more secure in the accompanying post. Begin your journey Generative AI innovation continues its advancement, bringing new opportunities every month. We’re excited for what is to come and look forward to sharing this journey of discovery with our customers. With today’s announcements - DiskANN vector indexing, Semantic Operators, and GraphRAG - Azure Database for PostgreSQL is ready to help you explore new boundaries of what’s possible. We invite you to begin your Generative AI journey today by exploring our new Solution Accelerator.1.8KViews3likes0CommentsQuest 1 – I Want to Build a Local Gen AI Prototype
In this quest, you’ll build a local Gen AI app prototype using JavaScript or TypeScript. You’ll explore open-source models via GitHub, test them in a visual playground, and use them in real code — all from the comfort of VS Code with the AI Toolkit. It’s fast, hands-on, and sets you up to build real AI apps, starting with a sketch.Managing Token Consumption with GitHub Copilot for Azure
Introduction AI Engineers often face challenges that require creative solutions. One such challenge is managing the consumption of tokens when using large language models. For example, you may observe heavy token consumption from a single client app or user, and determine that with that kind of usage pattern, the shared quota for other client applications relying on the same OpenAI backend will be depleted quickly. To prevent this, we need a solution that doesn't involve spending hours reading documentation or watching tutorials. Enter GitHub Copilot for Azure. GitHub Copilot for Azure Instead of diving into extensive documentation, we can leverage GitHub Copilot for Azure directly within VS Code. By invoking Copilot using azure, we can describe our issue in natural language. For our example, we might say: "Some users of my app are consuming too many tokens, which will affect tokens left for my other services. I need to limit the number of tokens a user can consume." Refer to video above for more context. azure GitHub Copilot in Action GitHub Copilot pools relevant ideas from https://learn.microsoft.com/ and suggests Azure services that can help. We can engage in a chat conversation, with follow-up questions like, "What happens if a user exceeds their token limit?" etcetera. This response from GitHub Copilot accurately describes the specific feature we need, along with the expected outcome/ behavior of user requests being blocked from accessing the backend, and users will receive a "too many requests" warning—exactly what we need. At this point, it felt like I was having a 1:1 chat with docs 🙃 Implementation To implement this, we ask GitHub Copilot for an example on enforcing the Azure token limit policy. It references the docs on Learn and provides a policy statement. Since we're not fully conversant with the product, we continue using Copilot to help with the implementation. Although GitHub Copilot chat cannot directly update our code, we can switch to GitHub Copilot Edits, provide some custom instructions in natural language, and watch as GitHub Copilot makes the necessary changes, which we review and accept/ decline. Testing and Deployment After implementing the policy, we redeploy our application using the Azure Developer CLI (azd) and restart our application and API to test. We now see that if a user sends another prompt after hitting the applied token limit, their request is terminated with a warning that the allocated limit is exceeded, along with instructions on what to do next. Conclusion Managing token consumption effectively is just one of the many ways GitHub Copilot for Azure can assist developers. Download and install the extension today to try it out yourself. If you have any scenarios you'd like to see us cover, drop them in the comments, and we'll feature them. See you in the next blog!JS AI Build-a-thon Setup in 5 Easy Steps
🔥 TL;DR — You’re 5 Steps Away from an AI Adventure Set up your project repo, follow the quests, build cool stuff, and level up. Everything’s automated, community-backed, and designed to help you actually learn AI — using the skills you already have. Let’s build the future. One quest at a time. 👉 Join the Build-a-thon | Chat on DiscordUBS unlocks advanced AI techniques with PostgreSQL on Azure
This blog was authored by Jay Yang, Executive Director, and Orhun Oezbek, GenAI Architect, UBS RiskLab UBS Group AG is a multinational investment bank and world-leading asset manager that manages $5.7 trillion in assets across 15 different markets. We continue to evolve our tools to suit the needs of data scientists and to integrate the use of AI. Our UBS RiskLab data science platform helps over 1,200 UBS data scientists expedite development and deployment of their analytics and AI solutions, which support functions such as risk, compliance, and finance, as well as front-office divisions such as investment banking and wealth management. RiskLab and UBS GOTO (Group Operations and Technology Office) have a long-term AI strategy to provide a scalable and easy-to-use AI platform. This strategy aims to remove friction and pain points for users, such as developers and data scientists, by introducing DevOps automation, centralized governance and AI service simplification. These efforts have significantly democratized AI development for our business users. This blog walks through how we created two RiskLab products using Azure services. We also explain how we’re using Azure Database for PostgreSQL to power advanced Retrieval Augmented-Generation (RAG) techniques—such as new vector search algorithms, parameter tuning, hybrid search, semantic ranking, and a graphRAG approach—to further the work of our financial generative AI use cases. The RiskLab AI Common Ecosystem (AICE) provides fully governed and simplified generative AI platform services, including: Governed production data access for AI development Managed large language model (LLM) endpoints access control Tenanted RAG environments Enhanced document insight AI processing Streamlined AI agent standardization, development, registration, and deployment solutions End-to-end machine learning (ML) model continuous integration, training, deployment, and monitoring processes The AICE Vector Embedding Governance Application (VEGA) is a fully governed and multi-tenant vector store built on top of Azure Database for PostgreSQL that provides self-service vector store lifecycle management and advanced indexing and retrieval techniques for financial RAG use cases. A focus on best practices like AIOps and MLOps As generative AI gained traction in 2023, we noticed the need for a platform that simplified the process for our data scientists to build, test, and deploy generative AI applications. In this age of AI, the focus should be on data science best practices—GenAIOps and MLOps. Most of our data scientists aren’t fully trained on MLOps, GenAIOps, and setting up complex pipelines, so AICE was designed to provide automated, self-serve DevOps provisioning of the Azure resources they need, as well as simplified MLOps and AIOps pipelines libraries. This removes operational complexities from their workflows. The second reason for AICE was to make sure our data scientists were working in fully governed environments that comply with data privacy regulations from the multiple countries in which UBS operates. To meet that need, AICE provides a set of generative AI libraries that fully manages data governance and reduces complexity. Overall, AICE greatly simplifies the work for our data scientists. For instance, the platform provides managed Azure LLM endpoints, MLflow for generative AI evaluation, and AI agent deployment pipelines along with their corresponding Python libraries. Without going into the nitty gritty of setting up a new Azure subscription, managing MLFlow instances, and navigating Azure Kubernetes Service (AKS) deployments, data scientists can just write three lines of code to obtain a fully governed and secure generative AI ecosystem to manage their entire application lifecycle. And, as a governed, secure lab environment, they can also develop and prototype ML models and generative AI applications in the production tier. We found that providing production read-only datasets to build these models significantly expedites our AI development. In fact, the process for developing an ML model, building a pipeline for model training, and putting it into production has dropped from six months to just one month. Azure Database for PostgreSQL and pgvector: The best of both worlds for relational and vector databases Once AICE adoption ramped up, our next step was to develop a comprehensive, flexible vector store that would simplify vector store resource provisioning while supporting hundreds of RAG use cases and tenants across both lab and production environments. Essentially, we needed to create RAG as a Service (RaaS) so our data scientists could build custom AI solutions in a self-service manner. When we started building VEGA and this vector store, we anticipated that effective RAG would require a diverse range of search capabilities covering not only vector searches but also more traditional document searches or even relational queries. Therefore, we needed a database that could pivot easily. We were looking for a really flexible relational database and decided on Azure Database for PostgreSQL. For a while, Azure Database for PostgreSQL has been our go-to database at RiskLab for our structured data use cases because it’s like the Swiss Army Knife of databases. It’s very compact and flexible, and we have all the tools we need in a single package. Azure Database for PostgreSQL offers excellent relational queries and JSONB document search. When used in conjunction with the pgvector extension for vector search, we created some very powerful hybrid search and hierarchical search RAG functionalities for our end users. The relational nature of Azure Database for PostgreSQL also allowed us to build a highly regulated authorization and authentication mechanism that makes it easy and secure for data scientists to share their embeddings. This involved meeting very stringent access control policies so that users’ access to vector stores is on a need-to-know basis. Integrations with the Azure Graph API help us manage those identities and ensure that the environment is fully secure. Using VEGA, data scientists can just click a button to add a user or group and provide access to all their embeddings/documents. It’s very easy, but it’s also governed and highly regulated. Speeding vector store initialization from days to seconds With VEGA, the time it takes to provision a vector store has dropped from days to less than 30 seconds. Instead of waiting days on a request for new instances of Azure Database for PostgreSQL, pgvector, and Azure AI Search, data scientists can now simply write five lines of code to stand up virtual, fully governed, and secure collections. And the same is true for agentic deployment frameworks. This speed is critical for lab work that involves fast iterations and experiments. And because we built on Azure Database for PostgreSQL, a single instance of VEGA can support thousands of vector stores. It’s cost-effective and seamlessly scales. Creating a hybrid search to analyze thousands of documents Since launching VEGA, one of the top hybrid search use cases has been Augmented Indexing Search (AIR Search), allowing data scientists to comb through financial documents and pinpoint the correct sections and text. This search uses LLMs as agents that first filter based on metadata stored in JSONB columns of the Azure Database for PostgreSQL, then apply vector similarity retrieval. Our thousands of well-structured financial documents are built with hierarchical headers that act as metadata, providing a filtering mechanism for agents and allowing them to retrieve sections in our documents to find precisely what they’re looking for. Because these agents are autonomous, they can decide on the best tools to use for the situation—either metadata filtering or vector similarity search. As a hybrid search, this approach also minimizes AI hallucinations because it gives the agents more context to work with. To enable this search, we used ChatGPT and Azure OpenAI. But because most of our financial documents are saved as PDFs, the challenge was retaining hierarchical information from headers that were lost when simply dumping in text from PDFs. We also had to determine how to make sure ChatGPT understood the meaning behind aspects like tables and figures. As a solution, we created PNG images of PDF pages and told ChatGPT to semantically chunk documents by titles and headers. And if it came across a table, we asked it to provide a YAML or JSON representation of it. We also asked ChatGPT to interpret figures to extract information, which is an important step because many of our documents contain financial graphs and charts. We’re now using Azure AI Document Intelligence for layout detection and section detection as the first step, which simplified our document ingestion pipelines significantly. Forecasting economic implications with PostgreSQL Graph Extension Since creating AICE and VEGA using Azure services, we’ve significantly enhanced our data science workflows. We’ve made it faster and easier to develop generative AI applications thanks to the speed and flexibility of Azure Database for PostgreSQL. Making advanced AI features accessible to our data scientists has accelerated innovation in RiskLab and ultimately allowed UBS to deliver exceptional value to our customers. Looking ahead, we plan to use the Apache AGE graph extension in Azure Database for PostgreSQL for macroeconomics knowledge retention capabilities. Specifically, we’re considering Azure tooling such as GraphRAG to equip UBS economist and portfolio managers with advanced RAG capabilities. This will allow them to retrieve more coherent RAG search results for use cases such as economics scenario generation and impact analysis, as well as investment forecasting and decision-making. For instance, a UBS business user will be able to ask an AI agent: if a country’s interest rate increases by a certain percentage, what are the implications to my client’s investment portfolio? The agent can perform a graph search to obtain all other connected economic entity nodes that might be affected by the interest rate entity node in the graph. We anticipate the AI-assisted graph knowledge will gain significant traction in the financial industry. Learn more For a deeper dive on how we created AICE and VEGA, check out this on-demand session from Ignite. We talk through our use of Azure Database for PostgreSQL and pgvector, plus we show a demo of our GraphRAG capabilities. About Azure Database for PostgreSQL Azure Database for PostgreSQL is a fully managed, scalable, and secure relational database service that supports open-source PostgreSQL. It enables organizations to build and manage mission-critical applications with high availability, built-in security, and automated maintenance.🚨Introducing the JS AI Build-a-thon 🚨
We’re entering a future where AI-first and agentic developer experiences will shape how we build — and you don’t want to be left behind. This isn’t your average hackathon. It’s a hands-on, quest-driven learning experience designed for developers, packed with: Interactive quests that guide you step by step — from your first prototype to production-ready apps Community-powered support via our dedicated Discord and local, community-led study jams Showcase moments to share your journey, get inspired, and celebrate what you build Whether you're just starting your AI journey or looking to sharpen your skills with frameworks like LangChain.js, tools like the Azure AI Foundry and AI Toolkit Extensions, or diving deeper into agentic app design — this is your moment to start building.GenAI for Scam & Fraud Detection
Chief Technology Officer and Microsoft Regional Director, Dr David Goad, recently highlighted the transformative potential of generative AI in combating financial scams, in a recent live stream on the Microsoft Reactor YouTube channel. With a wealth of experience in artificial intelligence and machine learning, David shared his expertise and insights, providing practical examples on the applications of generative AI and Azure AI Foundry for scam and fraud detection in the banking sector. The rise of digital banking and fraud David Goad began by setting the scene, discussing the significant shift towards digital banking over the past decade. This transition has brought numerous benefits, including convenience and access to information, but it has also led to a surge in financial scams and fraud. The banking industry, particularly in Australia, has seen billions of dollars lost annually due to these fraudulent activities. Key industry trends David highlighted several key trends and challenges in managing fraud and scams within the banking sector. He pointed out that phishing remains one of the most prevalent methods used by fraudsters, with targeted spear phishing and whale phishing aimed at senior individuals. This demographic is often targeted, leading to significant financial losses and stress. Additionally, phone calls, text messages, and emails are common delivery methods for scams. Opportunities for improvement with generative AI David Goad emphasized the opportunities for improvement in the scam detection process through the use of generative AI. He explained that generative AI can enhance various aspects of fraud detection, including identifying fraudulent emails and texts, summarizing customer complaints, categorizing complaints for efficient routing, and evaluating customer sentiment. By leveraging generative AI, banks can improve the accuracy and efficiency of their fraud detection processes, ultimately reducing customer frustration and enhancing service levels. Demonstrating generative AI for phishing detection To illustrate the practical application of generative AI, David demonstrated Azure AI Foundry and Azure OpenAI Studio. He showcased how generative AI can be trained to identify phishing emails by fine-tuning a model with a dataset of classified emails. The model that David presented, was able to classify emails as phishing or non-phishing and provide explanations for its decisions, demonstrating the potential for generative AI to streamline the fraud detection process. Learn more David Goad's presentation emphasized the transformative potential of generative AI in the banking sector and the opportunities that adopting generative AI can bring to help customer experiences. For those interested in learning more about this topic, David has written a LinkedIn article that delves deeper into the use of generative AI in fraud detection. To watch the full recording of David Goad's insightful presentation and technical demonstrations, visit the Microsoft Reactor YouTube channel.529Views0likes0Comments