ai
67 TopicsJS AI Build-a-thon Setup in 5 Easy Steps
đĽ TL;DR â Youâre 5 Steps Away from an AI Adventure Set up your project repo, follow the quests, build cool stuff, and level up. Everythingâs automated, community-backed, and designed to help you actually learn AI â using the skills you already have. Letâs build the future. One quest at a time. đ Join the Build-a-thon | Chat on DiscordGetting Started with Azure MCP Server: A Guide for Developers
The world of cloud computing is growing rapidly, and Azure is at the forefront of this innovation. If you're a student developer eager to dive into Azure and learn about Model Context Protocol (MCP), the Azure MCP Server is your perfect starting point. This tool, currently in Public Preview, empowers AI agents to seamlessly interact with Azure services like Azure Storage, Cosmos DB, and more. Let's explore how you can get started! đŻ Why Use the Azure MCP Server? The Azure MCP Server revolutionizes how AI agents and developers interact with Azure services. Here's a glimpse of what it offers: Exploration Made Easy: List storage accounts, databases, resource groups, tables, and more with natural language commands. Advanced Operations: Manage configurations, query analytics, and execute complex tasks like building Azure applications. Streamlined Integration: From JSON communication to intelligent auto-completion, the Azure MCP Server ensures smooth operations. Whether you're querying log analytics or setting up app configurations, this server simplifies everything. ⨠Installation Guide: One-Click and Manual Methods Prerequisites: Before you begin, ensure the following: Install either the Stable or Insiders release of VS Code. Add the GitHub Copilot and GitHub Copilot Chat extensions. Option 1: One-Click Install You can install the Azure MCP Server in VS Code or VS Code Insiders using NPX. Here's how: Simply run: npx -y /mcp@latest server start Option 2: Manual Install If you'd prefer manual setup, follow these steps: Create a .vscode/mcp.json file in your VS Code project directory. Add the following configuration: { "servers": { "Azure MCP Server": { "command": "npx", "args": ["-y", "@azure/mcp@latest", "server", "start"] } } } Here an example of the settings.json file Now, launch GitHub Copilot in Agent Mode to activate the Azure MCP Server. đ Supercharging Azure Development Once installed, the Azure MCP Server unlocks an array of capabilities: Azure Cosmos DB: List, query, manage databases and containers. Azure Storage: Query blob containers, metadata, and tables. Azure Monitor: Use KQL to analyze logs and monitor your resources. App Configuration: Handle key-value pairs and labeled configurations. Test prompts like: "List my Azure Storage containers" "Query my Log Analytics workspace" "Show my key-value pairs in App Config" These commands let your agents harness the power of Azure services effortlessly. đĄď¸ Security & Authentication The Azure MCP Server simplifies authentication using Azure Identity. Your login credentials are handled securely, with support for mechanisms like: Visual Studio credentials Azure CLI login Interactive Browser login For advanced scenarios, enable production credentials with: export AZURE_MCP_INCLUDE_PRODUCTION_CREDENTIALS=true Always perform a security review when integrating MCP servers to ensure compliance with regulations and standards. đ Why Join the Azure MCP Community? As a developer, you're invited to contribute to the Azure MCP Server project. Whether it's fixing bugs, adding features, or enhancing documentation, your contributions are valued. Explore the Contributing Guide for details on getting involved. The Azure MCP Server is your gateway to leveraging Azure services with cutting-edge technology. Dive in, experiment, and bring your projects to life! What Azure project are you excited to build with the MCP Server? Letâs brainstorm ideas together!4.8KViews4likes2CommentsBuilding an MCP Server for Microsoft Learn
So why Microsoft Learn? Well, it's a treasure trove of knowledge for developers and IT pros. Secondly, because it has search page with many filters, it lends itself well to be wrapped as a MCP server. I'm talking about this page of course Microsoft Learn. > DISCLAIMER, the below article is just to show you how easy you can wrap an API and make that into an MCP Server. There's now an official MCP Server for Docs and Learn which you're encouraged to use over building one for Docs and Learn yourself, see this link to the official server https://github.com/MicrosoftDocs/mcp MCP? Let's back up the tape a little, MCP, what's that? MCP stands for Model Context Protocol and is a new standard for dealing with AI apps. The idea is that you have features like prompts, tools and resources that you can use to build AI applications. Because it's a standard, you can easily share these features with others. In fact, you can use MCP servers running locally as well as ones that runs on an IP Address. Pait that with and Agentic client like Visual Studio Code or Claude Desktop and you have built an Agent. How do we build this? Well, the idea is to "wrap" Microsoft Learn's API for search for training content. This means that we will create an MCP server with tools that we can call. So first off, what parts of the API do we need? Free text search. We definitely want this one as it allows us to search for training content with keywords. Filters. We want to know what filters exist so we can search for content based on them. Topic, there are many different topics like products, roles and more, let's support this too. Building the MCP Server For this, we will use a transport method called SSE, Server-Sent Events. This is a simple way to create a server that people can interact with using HTTP. Here's how the server code will look: from mcp.server.fastmcp import FastMCP from starlette.applications import Starlette from starlette.routing import Mount, Host mcp = FastMCP("Microsoft Learn Search", "0.1.0") @mcp.tool() def learn_filter() -> list[Filter]: pass @mcp.tool() def free_text(query: str) -> list[Result]: pass @mcp.tool() def topic_search(category: str, topic: str) -> list[Result]: pass port = 8000 app = Starlette( routes=[ Mount('/', app=mcp.sse_app()), ] ) This code sets up a basic MCP server using FastMCP and Starlette. The learn_filter, free_text, and topic_search functions are placeholders for the actual implementation of the tools that will interact with the Microsoft Learn API. Note the use of the decorator "@mcp.tool" which registers these functions as tools in the MCP server. Look also at the result types Filter and Result, which are used to define the structure of the data returned by these tools. Implementing the Tools We will implement the tools one by one. Implementing the learn_filter Tool So let's start with the learn_filter function. This function will retrieve the available filters from the Microsoft Learn API. Here's how we can implement it: .tool() def learn_filter() -> list[Filter]: data = search_learn() return data Next, let's implement search_learn like so: # search/search.py import requests from config import BASE_URL from utils.models import Filter def search_learn() -> list[Filter]: """ Top level search function to fetch learn data from the Microsoft Learn API. """ url = to_url() response = requests.get(url) if response.status_code == 200: data = response.json() facets = data["facets"] filters = [] # print(facets.keys()) for key in facets.keys(): # print(f"{key}") for item in facets[key]: filter = Filter(**item) filters.append(filter) # print(f" - {filter.value} ({filter.count})") return filters else: return None def to_url(): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24top=30&showHidden=false&fuzzySearch=false" # config.py BASE_URL = "https://learn.microsoft.com" We also need to define the Filter model, we can use Pydantic for this: # utils/models.py class Filter(BaseModel): type: str value: str count: int How do we know what this looks like, well, we've tried to make a request to the API and looked at the response, here's an excerpt of the response: { "facets": { "roles": [ { "type": "role", "value": "Administrator", "count": 10 }, { "type": "role", "value": "Developer", "count": 20 } ], ... } } Technically, each item has more properties, but we only need the type, value, and count for our purposes. Implementing the free_text Tool For this, we will implement the free_text function, like so: # search/free_text.py import requests from utils.models import Result from config import BASE_URL def search_free_text(text: str) -> list[Result]: url = to_url(text) response = requests.get(url) results = [] if response.status_code == 200: data = response.json() if "results" in data: records = len(data["results"]) print(f"Search results: {records} records found") for item in data["results"]: result = Result(**item) result.url = f"{BASE_URL}{result.url}" results.append(result) return results def to_url(text: str): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&terms={text}&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24top=30&showHidden=false&fuzzySearch=false" Here we use a new type Result, which we also need to define. This will represent the search results returned by the Microsoft Learn API. Let's define it: # utils/models.py from pydantic import BaseModel class Result(BaseModel): title: str url: str popularity: float summary: str | None = None Finally, let's wire this up in our MCP server: .tool() def learn_filter() -> list[Filter]: data = search_learn() return data @mcp.tool() def free_text(query: str) -> list[Result]: print("LOG: free_text called with query:", query) data = search_free_text(query) return data Implementing Search by topic Just one more method to implement, the search_topic method, let's implement it like so: import requests from utils.models import Result from config import BASE_URL def search_topic(topic: str, category: str ="products") -> list[Result]: results = [] url = to_url(category, topic) response = requests.get(url) if response.status_code == 200: data = response.json() for item in data["results"]: result = Result(**item) result.url = f"{BASE_URL}{result.url}" results.append(result) return results else: return [] def to_url(category: str, topic: str): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24filter=({category}%2Fany(p%3A%20p%20eq%20%27{topic}%27))&%24top=30&showHidden=false&fuzzySearch=false" Great, this one also uses Result as the return type, so we only have left to wire this up in our MCP server: .tool() def topic_search(category: str, topic: str) -> list[Result]: print("LOG: topic_search called with category:", category, "and topic:", topic) data = search_topic(topic, category) return data That's it, let's move on to testing it. Testing the MCP Server with Visual Studio Code You can test this with Visual Studio Code and its Agent mode. What you need to do is: Create a file mcp.json in the .vscode directory and make sure it looks like this: { "inputs": [], "servers": { "learn": { "type": "sse", "url": "http://localhost:8000/sse" } } } Note how we point to a running server on port 8000, this is where our MCP server will run. Installing Dependencies We will need the following dependencies to run this code: pip install requests "mcp[cli]" You're also recommended to create a virtual environment for this project, you can do this with: python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` Then install the dependencies as shown above. Running the MCP Server To run the MCP server, you can use the following command: uvicorn server:app How this one works is Uvicorn looks for a file server.py and an app variable in that file, which is the Starlette app we created earlier. Try it out We got the server running, and an entry in Visual Studio Code. Click the play icon just above your entry in Visual Studio Code, this should connect to the MCP server. Try it out by typing in the search box in the bottom right of your GitHub Copilot extension. For example, type "Do a free text search on JavaScript, use a tool". It should look something like this: You should see the response coming back from Microsoft Learn and your MCP Server. Summary Congrats, if you've done all the steps so far, you've managed to build an MCP server and solve a very practical problem: how to wrap an API, for a search AND make it "agentic". Learn more Check out the repo MCP CurriculumHow to Use Postgres MCP Server with GitHub Copilot in VS Code
GitHub Copilot has changed how developers write code, but when combined with an MCP (Model Copilot Protocol) server, it also connects your services. With it, Copilot can understand your database schema and generate relevant code for your API, data models, or business logic. In this guide, you'll learn how to use the Neon Serverless Postgres MCP server with GitHub Copilot in VS Code to build a sample REST API quickly. We'll walk through how to create an Azure Function that fetches data from a Neon database, all with minimal setup and no manual query writing. From Code Generation to Database Management with GitHub Copilot AI agents are no longer just helping write codeâtheyâre creating and managing databases. When a chatbot logs a customer conversation, or a new region spins up in the Azure cloud, or a new user signs up, an AI agent can automatically create a database in seconds. No need to open a dashboard or call an API. This is the next evolution of software development: infrastructure as intent. With tools like database MCP servers, agents can now generate real backend services as easily as they generate code. GitHub Copilot becomes your full-stack teammate. It can answer database-related questions, fetch your database connection string, update environment variables in your Azure Function, generate SQL queries to populate tables with mock data, and even help you create new databases or tables on the fly. All directly from your editor, with natural language prompts. Neon has a dedicated MCP server that makes it possible for Copilot to directly understand the structure of your Postgres database. Let's get started with using the Neon MCP server and GitHub Copilot. What Youâll Need Node.js (>= v18.0.0) and npm: Download from nodejs.org. An Azure subscription (create one for free) Install either the stable or Insiders release of VS Code: Stable release Insiders release Install the GitHub Copilot, GitHub Copilot for Azure, and GitHub Copilot Chat extensions for VS Code Azure Functions Core Tools (for local testing) Connect GitHub Copilot to Neon MCP Server Create Neon Database Visit the Neon on Azure Marketplace page and follow the Create a Neon resource guide to deploy Neon on Azure for your subscription. Neon has free plan is more than enough to build proof of concept or kick off a new startup project. Install the Neon MCP Server for VS Code Neon MCP Server offers two options for connecting your VS Code MCP client to Neon. We will use the Remote Hosted MCP Server option. This is the simplest setupâno need to install anything locally or configure a Neon API key in your client. Add the following Neon MCP server configuration to your user settings in VS Code: { "mcp":{ "servers":{ "Neon":{ "command":"npx", "args":[ "-y", "mcp-remote", "https://mcp.neon.tech/sse" ] } } } } Click on Start on the MCP server. A browser window will open with an OAuth prompt. Just follow the steps to give your VS Code MCP client access to your Neon account. Generate an Azure Function REST API using GitHub Copilot Step 1: Create an empty folder (ex: mcp-server-vs-code) and open it in VS Code. Step 2: Open GitHub Copilot Chat in VS Code and switch to Agent mode. You should see the available tools. Step 3: Ask Copilot something like "Create an Azure function with an HTTP triggerâ: Copilot will generate a REST API using Azure Functions in JavaScript with a basic setup to run the functions locally. Step 4: Next, you can ask to list existing Neon projects: Step 5: Try out different prompts to fetch the connection string for the chosen database and set it to the Azure Functions settings. Then ask to create a sample Customer table and so on. Or you can even prompt to create a new database branch on Neon. Step 6: Finally, you can prompt Copilot to update the Azure functions code to fetch data from the table: Combine the Azure MCP Server Neon MCP gives GitHub Copilot full access to your database schema, so it can help you write SQL queries, connect to the database, and build API endpoints. But when you add Azure MCP Server into the mix, Copilot can also understand your Azure servicesâlike Blob Storage, Queues, and Azure AI. You can run both Neon MCP and Azure MCP at the same time to create a full cloud-native developer experience. For example: Use Neon MCP for serverless Postgres with branching and instant scale. Use Azure MCP to connect to other Azure services from the same Copilot chat. Even better: Azure MCP is evolving. In newer versions, you can spin up Azure Functions and other services directly from Copilot chat, without ever leaving your editor. Copilot pulls context from both MCP servers, which means smarter code suggestions and faster development. You can mix and match based on your stack and let Copilot help you build real, working apps in minutes. Final Thoughts With GitHub Copilot, Neon MCP server, and Azure Functions, you're no longer writing backend code line by line. It is so fast to build APIs. You're orchestrating services with intent. This is not the futureâitâs something you can use today.Enhancing AI Integrations with MCP and Azure API Management
As AI Agents and assistants become increasingly central to modern applications and experiences, the need for seamless, secure integration with external tools and data sources is more critical than ever. The Model Context Protocol (MCP) is emerging as a key open standard enabling these integrations - allowing AI models to interact with APIs, Databases, and other services in a consistent, scalable way. Understanding MCP MCP utilizes a client-host-server architecture built upon JSON-RPC 2.0 for messaging.Communication between clients and servers occurs over defined transport layers, primarily: stdio: Standard input/output, suitable for efficient communication when the client and server run on the same machine. HTTP with Server-Sent Events (SSE): Uses HTTP POST for client-to-server messages and SSE for server-to-client messages, enabling communication over networks, including remote servers. Why MCP Matters While Large Language Models (LLMs) are powerful, their utility is often limited by their inability to access real-time or proprietary data. Traditionally, integrating new data sources or tools required custom connectors/ implementations and significant engineering efforts. MCP addresses this by providing a unified protocol for connecting agents to both local and remote data sources - unifying and streamlining integrations. Leveraging Azure API Management for remote MCP servers Azure API Management is a fully managed platform for publishing, securing, and monitoring APIs. By treating MCP server endpoints as other backend APIs, organizations can apply familiar governance, security, and operational controls. With MCP adoption, the need for robust management of these backend services will intensify. API Management retains a vital role in governing these underlying assets by: Applying security controls to protect the backend resources. Ensuring reliability. Effective monitoring and troubleshooting with tracing requests and context flow. In this blog post, I will walk you through a practical example: hosting an MCP server behind Azure API Management, configuring credential management, and connecting with GitHub Copilot. A Practical Example: Automating Issue Triage To follow along with this scenario, please check out our Model Context Protocol (MCP) lab available at AI-Gateway/labs/model-context-protocol Let's move from theory to practice by exploring how MCP, Azure API Management (APIM) and GitHub Copilot can transform a common engineering workflow. Imagine you're an engineering manager aiming to streamline your team's issue triage process - reducing manual steps and improving efficiency. Example workflow: Engineers log bugs/ feature requests as GitHub issues Following a manual review, a corresponding incident ticket is generated in ServiceNow. This manual handoff is inefficient and error prone. Let's see how we can automate this process - securely connecting GitHub and ServiceNow, enabling an AI Agent (GitHub Copilot in VS Code) to handle triage tasks on your behalf. A significant challenge in this integration involves securely managing delegated access to backend APIs, like GitHub and ServiceNow, from your MCP Server. Azure API Management's credential manager solves this by centralizing secure credential storage and facilitating the secure creation of connections to your third-party backend APIs. Build and deploy your MCP server(s) We'll start by building two MCP servers: GitHub Issues MCP Server Provides tools to authenticate on GitHub (authorize_github), retrieve user infromation (get_user ) and list issues for a specified repository (list_issues). ServiceNow Incidents MCP Server Provides tools to authenticate with ServiceNow (authorize_servicenow), list existing incidents (list_incidents) and create new incidents (create_incident). & ServiceNow Incidents MCP Servers We are using Azure API Management to secure and protect both MCP servers, which are built using Azure Container Apps. Azure API Management's credential manager centralizes secure credential storage and facilitates the secure creation of connections to your backend third-party APIs. Client Auth: You can leverage API Management subscriptions to generate subscription keys, enabling client access to these APIs. Optionally, to further secure /sse and /messages endpoints, we apply the validate-jwt policy to ensure that only clients presenting a valid JWT can access these endpoints, preventing unauthorized access. (see: AI-Gateway/labs/model-context-protocol/src/github/apim-api/auth-client-policy.xml) After registering OAuth applications in GitHub and ServiceNow, we update APIM's credential manager with the respective Client IDs and Client Secrets. This enables APIM to perform OAuth flows on behalf of users, securely storing and managing tokens for backend calls to GitHub and ServiceNow. Connecting your MCP Server in VS Code With your MCP servers deployed and secured behind Azure API Management, the next step is to connect them to your development workflow. Visual Studio Code now supports MCP, enabling GitHub Copilot's agent mode to connect to any MCP-compatible server and extend its capabilities. Open Command Pallette and type in MCP: Add Server ... Select server type as HTTP (HTTP or Server-Sent Events) Paste in the Server URL Provide a Server ID This process automatically updates your settings.json with the MCP server configuration. Once added, GitHub Copilot can connect to your MCP servers and access the defined tools, enabling agentic workflows such as issue triage and automation. You can repeat these steps to add the ServiceNow MCP Server. Understanding Authentication and Authorization with Credential Manager When a user initiates an authentication workflow (e.g, via the authorize_github tool), GitHub Copilot triggers the MCP server to generate an authorization request and a unique login URL. The user is redirected to a consent page, where their registered OAuth application requests permissions to access their GitHub account. Azure API Management acts as a secure intermediary, managing the OAuth flow and token storage. Flow of authorize_github: Step 1 - Connection initiation: GitHub Copilot Agent invokes a sse connection to API Management via the MCP Client (VS Code) Step 2 - Tool Discovery: APIM forwards the request to the GitHub MCP Server, which responds with available tools Step 3 - Authorization Request: GitHub Copilot selects and executes authorize_github tool. The MCP server generates an authorization_id for the chat session. Step 4 - User Consent: If it's the first login, APIM requests a login redirect URL from the MCP Server The MCP Server sends the Login URL to the client, prompting the user to authenticate with GitHub Upon successful login, GitHub redirects the client with an authorization code Step 5 - Token Exchange and Storage: The MCP Client sends the authorization code to API Management APIM exchanges the code for access and refresh tokens from GitHub APIM securely stores the token and creates an Access Control List (ACL) for the service principal. Step 6 - Confirmation: APIM confirms successful authentication to the MCP Client, and the user can now perform authenticated actions, such as accessing private repositories. Check out the python logic for how to implement it: AI-Gateway/labs/model-context-protocol/src/github/mcp-server/mcp-server.py Understanding Tool Calling with underlaying APIs in API Management Using the list_issues tool, Connection confirmed APIM confirms the connection to the MCP Client Issue retrieval: The MCP Client requests issues from the MCP server The MCP Server attaches the authorization_id as a header and forwards the request to APIM The list of issues is returned to the agent You can use the same process to add the ServiceNow MCP Server. With both servers connected, GitHub Copilot Agent can extract issues from a private repo in GitHub and create new incidences in ServiceNow, automating your triage workflow. You can define additional tools such as suggest_assignee tool, assign_engineer tool, update_incident_status tool, notify_engineer tool, request_feedback tool and other to demonstrate a truly closed-loop, automated engineering workflow - from issue creation to resolution and feedback. Take a look at this brief demo showcasing the entire end-to-end process: Summary Azure API Management (APIM) is an essential tool for enterprise customers looking to integrate AI models with external tools using the Model Context Protocol (MCP). In this blog, we demonstrated how Azure API Management's credential manager solves the secure creation of connections to your backend APIs. By integrating MCP servers with VS Code and leveraging APIM for OAuth flows and token management, you can enable secure, agentic automation across your engineering tools. This approach not only streamlines workflows like issues triage and incident creation but also ensures enterprise-grade security and governance for all APIs. Additional Resources Using Credential Manager will help with managing OAuth 2.0 tokens to backend services. Client Auth for remote MCP servers: AZD up: https://aka.ms/mcp-remote-apim-auth AI lab Client Auth: AI-Gateway/labs/mcp-client-authorization/mcp-client-authorization.ipynb Blog Post: https://aka.ms/remote-mcp-apim-auth-blog If you have any questions or would like to learn more about how MCP and Azure API Management can benefit your organization, feel free to reach out to us. We are always here to help and provide further insights. Connect with us on LinkedIn (Julia Kasper & Julia Muiruri) and follow for more updates, insights, and discussions on AI integrations and API management.GPT-5: Will it RAG?
OpenAI released the GPT-5 model family last week, with an emphasis on accurate tool calling and reduced hallucinations. For those of us working on RAG (Retrieval-Augmented Generation), it's particularly exciting to see a model specifically trained to reduce hallucination. There are five variants in the family: gpt-5 gpt-5-mini gpt-5-nano gpt-5-chat: Not a reasoning model, optimized for chat applications gpt-5-pro: Only available in ChatGPT, not via the API As soon as GPT-5 models were available in Azure AI Foundry, I deployed them and evaluated them inside our most popular open source RAG template. I was immediately impressed - not by the model's ability to answer a question, but by it's ability to admit it could not answer a question! You see, we have one test question for our sample data (HR documents for a fictional company's) that sounds like it should be an easy question: "What does a Product Manager do?" But, if you actually look at the company documents, there's no job description for "Product Manager", only related jobs like "Senior Manager of Product Management". Every other model, including the reasoning models, has still pretended that it could answer that question. For example, here's a response from o4-mini: However, the gpt-5 model realizes that it doesn't have the information necessary, and responds that it cannot answer the question: As I always say: I would much rather have an LLM admit that it doesn't have enough information instead of making up an answer. Bulk evaluation But that's just a single question! What we really need to know is whether the GPT-5 models will generally do a better job across the board, on a wide range of questions. So I ran bulk evaluations using the azure-ai-evaluations SDK, checking my favorite metrics: groundedness (LLM-judged), relevance (LLM-judged), and citations_matched (regex based off ground truth citations). I didn't bother evaluating gpt-5-nano, as I did some quick manual tests and wasn't impressed enough - plus, we've never used a nano sized model for our RAG scenarios. Here are the results for 50 Q/A pairs: metric stat gpt-4.1-mini gpt-4o-mini gpt-5-chat gpt-5 gpt-5-mini o3-mini groundedness pass % 94% 86% 96% 100% đ 94% 96% â mean score 4.76 4.50 4.86 5.00 đ 4.82 4.80 relevance pass % 94% đ 84% 90% 90% 74% 90% â mean score 4.42 đ 4.22 4.06 4.20 4.02 4.00 answer_length mean 829 919 549 844 940 499 latency mean 2.9 4.5 2.9 9.6 7.5 19.4 citations_matched % 52% 49% 52% 47% 49% 51% For the LLM-judged metrics of groundedness and relevance , the LLM awards a score of 1-5, and both 4 and 5 are considered passing scores. That's why you see both a "pass %" (percentage with 4 or 5 score) and an average score in the table above. For the groundedness metric, which measures whether an answer is grounded in the retrieved search results, the gpt-5 model does the best (100%), while the other gpt-5 models do quite well as well, on par with our current default model of gpt-4.1-mini. For the relevance metric, which measures whether an answer fully answers a question, the gpt-5 models don't score as highly as gpt-4.1-mini. I looked into the discrepancies there, and I think that's actually due to gpt-5 being less willing to give an answer when it's not fully confident in it - it would rather give a partial answer instead. That's a good thing for RAG apps, so I am comfortable with that metric being less than 100%. The latency metric is generally higher for the gpt-5 reasoning models, as would be expected, but is also variable based on on deployment region, region capacity, etc, assuming you're not using a "provisioned thoroughput" deployment. Also note that the latency here records the total time taken, from first token to last token, whereas the most important metric for a user-facing streaming chat is the time to first token. For the gpt-5 reasoning models in this evaluation, I set the reasoning_effort to "minimal", which means the model chooses whether to use reasoning tokens. I have never seen it actually use any reasoning tokens when I set the effort to minimal, so maybe that means that a higher reasoning effort is really only needed for longer or more complex tasks, and RAG answering is a non-reasoning task. A higher reasoning effort would definitely affect the latency and likely also affect the answer quality. I did not test that out, since the "minimal" effort setting already results in high quality answers. Answer differences Now that we've seen the overall metrics, let's dig into some actual answers, and see some of the key ways that GPT-5 model answers differ. Quicker to say "I don't know" đ¤ˇđť Let's start with the first thing I noticed: gpt-5 was happy to admit when it couldn't fully answer a question. In the 50 questions evaluated, the gpt-5 model started off their answer with "I don't know" for 3 of the answers, while gpt-5-mini started that way for 10 of the answers. In table form: gpt-4.1-mini gpt-5 gpt-5-mini gpt-5-chat 1 (1.6%) 3 (6%) 10 (20%) 1 (1.6%) Here's an example of a gpt-4.1-mini answer versus a gpt-5 answer: Q: Explain the eligibility criteria for the Northwind Standard plan offered to Contoso employees? gpt-4.1-mini gpt-5 The eligibility criteria for the Northwind Standard plan for Contoso employees are not explicitly detailed in the provided sources. However, it is mentioned that prior-authorization may be required for certain services or treatments, which involves approval from Northwind Health based on medical necessity and coverage under the plan. Employees must have their physician submit a request for prior-authorization, and in some cases, meet specific criteria such as medical tests, treatments, or lifestyle changes to be approved. Preventive care, emergency services, mental health, and substance abuse coverage do not require prior-authorization. Out-of-network services do require prior-authorization and may incur additional costs [Northwind_Standard_Benefits_Details.pdf#page=63][Northwind_Standard_Benefits_Details.pdf#page=102]. If you need information on basic eligibility (such as employment status or tenure), that is not provided in these sources. I donât know. The provided sources describe compliance (ERISA, COBRA, HIPAA, ACA) and note that Northwind Standard is a group plan covering âall eligible employees of Contoso and their spouses or dependents,â but they do not define the specific eligibility criteria (e.g., hours worked, waiting period, employment class). [Northwind_Standard_Benefits_Details.pdf#page=88][Northwind_Standard_Benefits_Details.pdf#page=85][Northwind_Standard_Benefits_Details.pdf#page=86] Both of the answers admit some degree of not-knowing, but gpt-5 comes right out with it at the start of the answer. Notably, our system prompt includes this instruction: Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. So what we may be seeing is a combination of two factors: 1) the gpt-5 reasoning models are generally better at following instructions, 2) the gpt-5 reasoning models went through a specific alignment process to reduce hallucination. Notably, however, the gpt-5-chat model rarely responded with "I don't know", so it is not as affected by whatever factors are increasing the likelihood of the reasoning models to answer that way. You may be wondering why I have questions in the evaluation set that aren't fully answerable - it's hard to generate a good set of ground truth for evaluation, and my ground truth data is not the optimal set of questions. Ideally, I would do a separate evaluation for all the fully knowable questions versus all the unknowable questions - and I even have a specific custom metric for "don't-know-ness" that I run on those - but a few partially knowable questions snuck into this ground truth data set. Requests for clarification đđťââď¸ Our system prompt also encourages the model to ask clarifying questions when necessary: If asking a clarifying question to the user would help, ask the question. But I had never actually seen a clarifying question from a model until gpt-5-chat, which replied with a question to 2 of the 50 questions. Example: Q: What should I do if I need healthcare services outside Washington with Northwind Health? gpt-4.1-mini gpt-5-chat If you have Northwind Standard and need healthcare services outside Washington, you must meet these criteria for coverage: be outside Washington for no more than 12 consecutive months, be more than 100 miles away from your primary care provider, and have had your plan in effect for at least 6 months before receiving out-of-area care. You must contact Northwind Health in advance and obtain prior authorization for the care; otherwise, you may be responsible for the full cost. Emergency services outside Washington are covered, but non-emergency services are not unless these conditions are met [Northwind_Standard_Benefits_Details.pdf#page=59][Northwind_Standard_Benefits_Details.pdf#page=60]. If you have Northwind Health Plus, first ensure the provider is in the nationwide network by checking the online directory or calling customer service. Keep receipts and documentation for any out-of-state services, as you may need to submit claims for reimbursement. Contact Northwind Health Plus before receiving services to confirm coverage, as not all services outside Washington are covered [Northwind_Health_Plus_Benefits_Details.pdf#page=62]. Which plan do you have â Northwind Standard or Northwind Health Plus? Like in the example above, other models typically avoid asking a clarifying question by instead addressing every possible scenario (covering all their bases!), resulting in a fairly long answer. Arguably, it's a better user experience to ask the clarifying question, so that the user can get an answer tailored to their specific situation and not get confused by other possibilities. The primary drawback is the additional back-and-forth for the final answer. I did not personally see clarifying questions from the GPT-5 reasoning models, but my colleagues noticed an increase in such questions when using gpt-5 with GitHub Copilot Agent mode, so it's possible that every model in the family has an increased likelihood to ask for clarification when faced with ambiguity. Reasoning models â¤ď¸ lists I saw a definite tendency of the GPT-5 reasoning models to output the answer as a list, even though there's nothing in the RAG system prompt that tells it to use lists. Out of the 50 questions evaluated, here are the number with answers formatted as a top-level bulleted list: gpt-4.1-mini gpt-5 gpt-5-mini gpt-5-chat 0 36 (72%) 26 (52%) 0 Here's an example of a gpt-4.1-mini answer (paragraph) versus a gpt-5 answer (list): Q: What are the coverage limitations of the Northwind Standard plan regarding clinical trials? gpt-4.1-mini gpt-5 The Northwind Standard plan covers certain clinical trial services such as diagnostic testing, treatment of the condition being studied, medications, lab services, and imaging services. However, it does not cover travel expenses associated with attending clinical trials. Additionally, any experimental treatments or services not part of the clinical trial are not covered. Coverage for other clinical trial services not explicitly listed may be considered on a case-by-case basis. Members should contact Northwind Health customer service for more details [Northwind_Standard_Benefits_Details.pdf#page=23][Northwind_Standard_Benefits_Details.pdf#page=24]. Travel expenses for attending clinical trials are not covered [Northwind_Standard_Benefits_Details.pdf#page=23][Northwind_Standard_Benefits_Details.pdf#page=24]. Experimental treatments or services that are not part of the clinical trial are not covered [Northwind_Standard_Benefits_Details.pdf#page=24]. Services or treatments deemed experimental or investigational are not covered by the plan [Northwind_Standard_Benefits_Details.pdf#page=53]. Now, is it a bad thing that the gpt-5 reasoning models use lists? Not necessarily! But if that's not the style you're looking for, then you either want to consider the gpt-5-chat model or add specific messaging in the system prompt to veer the model away from top level lists. Longer answers As we saw in overall metrics above, there was an impact of the answer length (measured in the number of characters, not tokens). Let's isolate those stats: gpt-4.1-mini gpt-5 gpt-5-mini gpt-5-chat 829 844 990 549 The gpt-5 reasoning models are generating answers of similar length to the current baseline of gpt-4.1-mini, though the gpt-5-mini model seems to be a bit more verbose. The API now has a new parameter to control verbosity for those models, which defaults to "medium". I did not try an evaluation with that set to "low" or "high", which would be an interesting evaluation to run. The gpt-5-chat model outputs relatively short answers, which are actually closer in length to the answer length that I used to see from gpt-3.5-turbo. What answer length is best? A longer answer will take longer to finish rendering to the user (even when streaming), and will cost the developer more tokens. However, sometimes answers are longer due to better formatting that is easier to skim, so longer does not always mean less readable. For the user-facing RAG chat scenario, I generally think that shorter answers are better. If I was putting these gpt-5 reasoning models in production, I'd probably try out the "low" verbosity value, or put instructions in the system prompt, so that users get their answers more quickly. They can always ask follow-up questions as needed. Fancy punctuation This is a weird difference that I discovered while researching the other differences: the GPT-5 models are more likely to use âsmartâ quotes instead of standard ASCII quotes. Specifically: Left single: â (U+2018) Right single / apostrophe: â (U+2019) Left double: â (U+201C) Right double: â (U+201D) For example, the gpt-5 model actually responded with " I donât know ", not with " I don't know ". It's a subtle difference, but if you are doing any sort of post-processing or analysis, it's good to know. I've also seen the models using the smart quotes incorrectly in coding contexts (like GitHub Copilot Agent mode), so that's another potential issue to look out for. I assumed that the models were trained on data that tended to use smart quotes more often, perhaps synthetic data or book text. I know that as a normal human, I rarely use them, given the extra effort required to type them. Query rewriting to the extreme Our RAG flow makes two LLM calls: the second answers the question, as youâd expect, but the first rewrites the userâs query into a strong search query. This step can fix spelling mistakes, but itâs even more important for filling in missing context in multi-turn conversationsâlike when the user simply asks, âwhat else?â A well-crafted rewritten query leads to better search results, and ultimately, a more complete and accurate answer. During my manual tests, I noticed that the rewritten queries from the GPT-5 models are much longer, filled to the brim with synonyms. For example: Q: What does a Product Manager do? gpt-4.1-mini gpt-5-mini product manager responsibilities Product Manager role responsibilities duties skills day-to-day tasks product management overview Are these new rewritten queries better or worse than the previous short ones? It's hard to tell, since they're just one factor in the overall answer output, and I haven't set up a retrieval-specific metric. The closest metric is citations_matched , since the new answer from the app can only match the citations in the ground truth if the app managed to retrieve all the same citations. That metric was generally high for these models, and when I looked into the cases where the citations didn't match, I typically thought the gpt-5 family of responses were still good answers. I suspect that the rewritten query does not have a huge effect either way, since our retrieval step uses hybrid search from Azure AI Search, and the combined power of both hybrid and vector search generally compensates for differences in search query wording. It's worth evaluating this further however, and considering using a different model for the query rewriting step. Developers often choose to use a smaller, faster model for that stage, since query rewriting is an easier task than answering a question. So, are the answers accurate? Even with a 100% groundedness score from an LLM judge, it's possible that a RAG app can be producing inaccurate answers, like if the LLM judge is biased or the retrieved context is incomplete. The only way to really know if RAG answers are accurate is to send them to a human expert. For the sample data in this blog, there is no human expert available, since they're based off synthetically generated documents. Despite two years of staring at those documents and running dozens of evaluations, I still am not an expert in the HR benefits of the fictional Contoso company. That's why I also ran the same evaluations on the same RAG codebase, but with data that I know intimately: my own personal blog. I looked through 200 answers from gpt-5, and did not notice any inaccuracies in the answers. Yes, there are times when it says "I don't know" or asks a clarifying question, but I consider those to be accurate answers, since they do not spread misinformation. I imagine that I could find some way to trick up the gpt-5 model, but on the whole, it looks like a model with a high likelihood of generating accurate answers when given relevant context. Evaluate for yourself! I share my evaluations on our sample RAG app as a way to share general learnings on model differences, but I encourage every developer to evaluate these models for your specific domain, alongside domain experts that can reason about the correctness of the answers. How can you evaluate? If you are using the same open source RAG project for Azure, deploy the GPT-5 models and follow the steps in the evaluation guide. If you have your own solution, you can use an open-source SDK for evaluating, like azure-ai-evaluation (the one that I use), DeepEval, promptfoo, etc. If you are using an observability platform like Langfuse, Arize, or Langsmith, they have evaluation strategies baked in. Or if you're using an agents framework like Pydantic AI, those also often have built-in eval mechanisms. If you can share what you learn from evaluations, please do! We are all learning about the strange new world of LLMs together.1.4KViews2likes0CommentsImpariamo a conoscere MCP: Introduzione al Model Context Protocol (MCP)
Non perderti il prossimo evento âLetâs Learn â MCPâ su Microsoft Reactor il 24 di Luglio, pensato per chiunque voglia conoscere meglio il nuovo standard per agenti intelligenti (il Model Context Protocol) e imparare a metterlo in pratica. La sessione è in Italiano e le demo sono in Python, ma fa parte di una serie di live-streaming disponibili in tantissime lingue.AI Repo of the Week: Generative AI for Beginners with JavaScript
Introduction Ready to explore the fascinating world of Generative AI using your JavaScript skills? This weekâs featured repository, Generative AI for Beginners with JavaScript, is your launchpad into the future of application development. Whether you're just starting out or looking to expand your AI toolbox, this open-source GitHub resource offers a rich, hands-on journey. It includes interactive lessons, quizzes, and even time-travel storytelling featuring historical legends like Leonardo da Vinci and Ada Lovelace. Each chapter combines narrative-driven learning with practical exercises, helping you understand foundational AI concepts and apply them directly in code. Itâs immersive, educational, and genuinely fun. What You'll Learn 1. đ§ Foundations of Generative AI and LLMs Start with the basics: What is generative AI? How do large language models (LLMs) work? This chapter lays the groundwork for how these technologies are transforming JavaScript development. 2. đ Build Your First AI-Powered App Walk through setting up your environment and creating your first AI app. Learn how to configure prompts and unlock the potential of AI in your own projects. 3. đŻ Prompt Engineering Essentials Get hands-on with prompt engineering techniques that shape how AI models respond. Explore strategies for crafting prompts that are clear, targeted, and effective. 4. đŚ Structured Output with JSON Learn how to guide the model to return structured data formats like JSONâcritical for integrating AI into real-world applications. 5. đ Retrieval-Augmented Generation (RAG) Go beyond static prompts by combining LLMs with external data sources. Discover how RAG lets your app pull in live, contextual information for more intelligent results. 6. đ ď¸ Function Calling and Tool Use Give your LLM new powers! Learn how to connect your own functions and tools to your app, enabling more dynamic and actionable AI interactions. 7. đ Model Context Protocol (MCP) Dive into MCP, a new standard for organizing prompts, tools, and resources. Learn how it simplifies AI app development and fosters consistency across projects. 8. âď¸ Enhancing MCP Clients with LLMs Build on what youâve learned by integrating LLMs directly into your MCP clients. See how to make them smarter, faster, and more helpful. ⨠More chapters coming soonâwatch the repo for updates! Companion App: Interact with History Experience the power of generative AI in action through the companion web appâwhere you can chat with historical figures and witness how JavaScript brings AI to life in real time. Conclusion Generative AI for Beginners with JavaScript is more than a courseâitâs an adventure into how storytelling, coding, and AI can come together to create something fun and educational. Whether you're here to upskill, experiment, or build the next big thing, this repository is your all-in-one resource to get started with confidence. đ Jump into the future of developmentâcheck out the repo and start building with AI today!Is it a bug or a feature? Using Prompty to automatically track and tag issues.
Introduction Youâve probably noticed a theme in my recent posts: tackling challenges with AI-powered solutions. In my latest project, I needed a fast way to classify and categorize GitHub issues using a predefined set of tags. The tag data was there, but the connections between issues and tags werenât. To bridge that gap, I combined Azure OpenAI Service, Prompty, and a GitHub to automatically extract and assign the right labels. By automating issue tagging, I was able to: Streamline contributor workflows with consistent, on-time labels that simplify triage Improve repository hygiene by keeping issues well-organized, searchable, and easy to navigate Eliminate repetitive maintenance so the team can focus on community growth and developer empowerment Scale effortlessly as the project expands, turning manual chores into intelligent automation Challenge: 46 issues, no tags The Prompty repository currently hosts 46 relevant, but untagged, issues. To automate labeling, I first defined a complete tag taxonomy. Then I built a solution using: Prompty for prompt templating and function calling Azure OpenAI (gpt-4o-mini) to classify each issue Azure AI Search for retrieval-augmented context (RAG) Python to orchestrate the workflow and integrate with GitHub By the end, youâll have an autonomous agent that fetches open issues, matches them against your custom taxonomy, and applies labels back on GitHub. Prerequisites: An Azure account with Azure AI Search and Azure OpenAI enabled Python and Prompty installed Clone the repo and install dependencies: pip install -r requirements.txt Step 1: Define the prompt template Weâll use Prompty to structure our LLM instructions. If you havenât yet, install the Prompty VS Code extension and refer to the Prompty docs to get started. Prompty combines: Tooling to configure and deploy models Runtime for executing prompts and function calls Specification (YAML) for defining prompts, inputs, and outputs Our Prompty is set to use gpt-4o-mini and below is our sample input: sample: title: Including Image in System Message tags: ${file:tags.json} description: An error arises in the flow, coming up starting from the "complete" block. It seems like it is caused by placing a static image in the system prompt, since removing it causes the issue to go away. Please let me know if I can provide additional context. The inputs will be the tags file implemented using RAG, then we will fetch the issue title and description from GitHub once a new issue is posted. Next, in our Prompty file, we gave instructions of how the LLLM should work as follows: system: You are an intelligent GitHub issue tagging assistant. Available tags: ${inputs} {% if tags.tags %} ## Available Tags {% for tag in tags.tags %} name: {{tag.name}} description: {{tag.description}} {% endfor %} {% endif %} Guidelines: 1. Only select tags that exactly match the provided list above 2. If no tags apply, return an empty array [] 3. Return ONLY a valid JSON array of strings, nothing else 4. Do not explain your choices or add any other text Use your understanding of the issue and refer to documentation at https://prompty.ai to match appropriate tags. Tags may refer to: - Issue type (e.g., bug, enhancement, documentation) - Tool or component (e.g., tool:cli, tracer:json-tracer) - Technology or integration (e.g., integration:azure, runtime:python) - Conceptual elements (e.g., asset:template-loading) Return only a valid JSON array of the issue title, description and tags. If the issue does not fit in any of the categories, return an empty array with: ["No tags apply to this issue. Please review the issue and try again."] Example: Issue Title: "App crashes when running in Azure CLI" Issue Body: "Running the generated code in Azure CLI throws a Python runtime error." Tag List: ["bug", "tool:cli", "runtime:python", "integration:azure"] Output: ["bug", "tool:cli", "runtime:python", "integration:azure"] user: Issue Title: {{title}} Issue Description: {{description}} Once the Prompty file was ready, I right clicked on the file and converted it to Prompty code, which provided a Python base code to get started from, instead of building from scratch. Step 2: enrich with context using Azure AI Search To be able to generate labels for our issues, I created a sample of tags, around 20, each with a title and a description of what it does. As a starting point, I started with Azure AI Foundry, where I uploaded the data and created an index. This typically takes about 1hr to successfully complete. Next, I implemented a retrieval function: def query_azure_search(query_text): """Query Azure AI Search for relevant documents and tags.""" search_client = SearchClient( endpoint=SEARCH_SERVICE_ENDPOINT, index_name=SEARCH_INDEX_NAME, credential=AzureKeyCredential(SEARCH_API_KEY) ) # Perform the search results = search_client.search( search_text=query_text, query_type=QueryType.SIMPLE, top=5 # Retrieve top 5 results ) # Extract content and tags from results documents = [doc["content"] for doc in results] tags = [doc.get("tags", []) for doc in results] # Assuming "tags" is a field in the index # Flatten and deduplicate tags unique_tags = list(set(tag for tag_list in tags for tag in tag_list)) return documents, unique_tags Step 3: Orchestrate the Workflow In addition, to adding RAG, I added functions in the basic.py file to: fetch_github_issues: calls the GitHub REST API to list open issues and filters out any that already have labels. run_with_rag: on the issues selected, calls the query_azure_search to append any retrieved docs, tags the issues and parses the JSON output from the prompt to a list for the labels label_issue: patches the issue to apply a list of labels. process_issues: this fetches all unlabelled issues, extracts the rag pipeline to generate the tags, and calls the labels_issue tag to apply the tags scheduler loop: this runs every so often to check if there's a new issue and apply a label Step 4: Validate and Run Ensure all .env variables are set (API keys, endpoints, token). Install dependencies and execute using: python basic.py Create a new GitHub issue and watch as your agent assigns tags in real time. Below is a short demo video here to illustrate the workflow. Next Steps Migrate from PATs to a GitHub App for tighter security Create multi-agent application and add an evaluator agent to review tags before publishing Integrate with GitHub Actions or Azure Pipelines for CI/CD Conclusion and Resources By combining Prompty, Azure AI Search, and Azure OpenAI, you can fully automate GitHub issue triageâimproving consistency, saving time, and scaling effortlessly. Adapt this pattern to any classification task in your own workflows! You can learn more using the following resources: Prompty documentation to learn more on Prompty Agents for Beginners course to learn how you can build your own agentPose Estimation with the AI Dev Gallery
What's Going On Here? This blog post is the first in an upcoming series that will spotlight the local AI samples contained in the new AI Dev Gallery. The Gallery is a preview project that aims to showcase local AI scenarios on Windows and to give developers the guidance they need to enable those scenarios themselves. The Gallery is open-source and contains a wide selection of different models and samples, including text, image, audio, and video use cases. In addition to being able to see a given model in action, each sample contains a source code view and a button to export the sample directly to a new Visual Studio project. The Gallery is available on the Microsoft Store and is entirely open sourced on GitHub. For this first sample spotlight, we will be taking a look at one of my favorite scenarios: Human Pose Estimation with HRNet. This sample is enabled by ONNX Runtime, and depending on the processor in your Windows device, this sample supports running on the CPU and NPU. I'll cover how to check which hardware is supported and how to switch between them later in the post. Pose Estimation Demo This sample takes in an uploaded photo and renders pose estimations onto the main human figure in the photo. It will render connections between the torso and limbs, along with five points corresponding to key facial features (eyes, nose, and ears). Before diving into the code for this sample, here's a quick video example: Let's get right to the code to see how this implemented. Code Walkthrough This walkthrough will focus on essential code and may gloss over some UI logic and helper functions. The full code for this sample can be browsed in depth in the AI Dev Gallery itself or in the GitHub repository. When this sample is first opened, it will make an initial call to LoadModelAsync which looks like this: protected override async Task LoadModelAsync(SampleNavigationParameters sampleParams) { // Tell our inference session where our model lives and which hardware to run it on await InitModel(sampleParams.ModelPath, sampleParams.HardwareAccelerator); sampleParams.NotifyCompletion(); // Make first call to inference once model is loaded await DetectPose(Path.Join(Windows.ApplicationModel.Package.Current.InstalledLocation.Path, "Assets", "pose_default.png")); } In this function, a ModelPath and HardwareAccelerator are passed into our InitModel function, which handles instantiating an ONNX Runtime InferenceSession with our model location and the hardware that inference will be performed on. You can jump to Switching to NPU Execution later in this post for more in depth information on how the InferenceSession is instantiated. Once the model has finished initializing, this function calls for an initial round of inference via DetectPose on a default image. Preprocessing, Calling For Inference, and Postprocessing Output The inference logic, along with the required preprocessing and postprocessing, takes place in the DetectPose function. This is a pretty long function, so let's go through it piece by piece. First, this function checks that it was passed a valid file path and performs some updates to our XAML: private async Task DetectPose(string filePath) { // Check if the passed in file path exists, and return if not if (!Path.Exists(filePath)) { return; } // Update XAML to put the view into the "Loading" state Loader.IsActive = true; Loader.Visibility = Visibility.Visible; UploadButton.Visibility = Visibility.Collapsed; DefaultImage.Source = new BitmapImage(new Uri(filePath)); Next, the input image is loaded into a Bitmap and then resized to the expected input size of the HRNet model (256x192) with the helper function ResizeBitmap: // Load bitmap from image filepath using Bitmap originalImage = new(filePath); // Store expected input dimensions in variables, as these will be used later int modelInputWidth = 256; int modelInputHeight = 192; // Resize Bitmap to expected dimensions with ResizeBitmap helper using Bitmap resizedImage = BitmapFunctions.ResizeBitmap(originalImage, modelInputWidth, modelInputHeight); Once the image is stored in a bitmap of the proper size, we create a Tensor of dimensionality 1x3x192x256 that will represent the image. Each dimension, in order, corresponds to these values: Batch Size: our first value of 1 is just the number of inputs that are being processed. This implementation processes a single image at a time, so the batch size is just one. Color Channels: The next dimension has a value of 3 and corresponds to each of the typical color channels: red, green, and blue. This will define the color of each pixel in the image. Width: The next value of 256 (passed as modelInputWidth) is the pixel width of our image. Height: The last value of 192 (passed as modelInputHeight) is the pixel height of our image. Taken as a whole, this tensor represents a single image where each pixel in that image is defined by an X (width) and Y (height) pixel value and three-color values (red, green, blue). Also, it is good to note that the processing and inference section of this function is being ran in a Task to prevent the UI from becoming blocked: // Run our processing and inference logic as a Task to prevent the UI from being blocked var predictions = await Task.Run(() => { // Define a tensor that represents every pixel of a single image Tensor<float> input = new DenseTensor<float>([1, 3, modelInputWidth, modelInputHeight]); To improve the quality of the input, instead of just passing in the original pixel values to the tensor, the pixels values are normalized with the PreprocessBitmapWithStdDev helper function. This function uses the mean of each RGB value and the standard deviation (how far a value typically varies away from its mean) to "level out" outlier color values. You can think of it as a way of preventing images with really dramatic color differences from confusing the model. This step does not affect the dimensionality of the input. It only adjusts the values that will be stored in the tensor: // Normalize our input and store it in the "input" tensor. Dimension is still 1x3x256x192 input = BitmapFunctions.PreprocessBitmapWithStdDev(resizedImage, input); There is one last small step of set up before the input is passed to the InferenceSession, as ONNX expects a certain input format for inference. A List of type NamedOnnxValue is created with only one entry representing the input tensor that was just processed. Each NamedOnnxValue expects a metadata name (which is grabbed from the model itself using the InferenceSession) and a value (the tensor that was just processed): // Snag the input metadata name from the inference session var inputMetadataName = _inferenceSession!.InputNames[0]; // Create a list of NamedOnnxValues, with one entry var onnxInputs = new List<NamedOnnxValue> { // Call NamedOnnxValue.CreateFromTensor and pass in input metadata name and input tensor NamedOnnxValue.CreateFromTensor(inputMetadataName, input) }; The onnxInputs list that was just created is passed to InferenceSession.Run. It returns a collection of DisposableNamedOnnxValues to be processed: // Call Run to perform inference using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = _inferenceSession!.Run(onnxInputs); The output of the HRNet model is a bit more verbose than a list of coordinates that correspond with human pose key points (like left knee, or right shoulder). Instead of exact predictions, it returns a heatmap for every pose key point that scores each location on the image with a probability that a certain joint exists there. So, there's a bit more work to do to get points that can be placed on an image. First, the function sets up the necessary values for post processing: // Fetch the heatmaps list from the inference results var heatmaps = results[0].AsTensor<float>(); // Get the output name from the inference session var outputName = _inferenceSession!.OutputNames[0]; // Use the output name to get the dimensions of the output from the inference session var outputDimensions = _inferenceSession!.OutputMetadata[outputName].Dimensions; // Finally, get the output width and height from those dimensions float outputWidth = outputDimensions[2]; float outputHeight = outputDimensions[3]; The output width and height are passed, along with the heatmaps list and the original image dimensions, to the PostProcessResults helper function. This function does two actions with each heatmap: It iterates over every value in the heatmap to find the coordinates where the probability is highest for each pose key point. It scales that value back to the size of the original image, since it was changed when it was passed into inference. This is why the original image dimensions were passed. From this function, a list of tuples containing the X and Y location of each key point is returned, so that they can be properly rendered onto the image: // Post process heatmap results to get key point coordinates List<(float X, float Y)> keypointCoordinates = PoseHelper.PostProcessResults(heatmaps, originalImage.Width, originalImage.Height, outputWidth, outputHeight); // Return those coordinates from the task return keypointCoordinates; }); Next up is rendering. Rendering Pose Predictions Rendering is handled by the RenderPredictions helper function which takes in the original image, the predictions that were generated, and a marker ratio to define how large to draw the predictions on the image. Note that this code is still being called from the DetectPose function: using Bitmap output = PoseHelper.RenderPredictions(originalImage, predictions, .02f); Rendering predictions is pretty key to the pose estimation flow, so let's dive into this function. This function will draw two things: Red ellipses at each pose key point (right knee, left eye, etc.) Blue lines connecting joint key points (right knee to right ankle, left shoulder to left elbow, etc.) Face key points (eyes, nose, ears) do not have any connections, and will just have dots ellipses rendered for them. The first thing the function does is set up the Graphics, Pen, and Brush objects necessary for drawing: public static Bitmap RenderPredictions(Bitmap image, List<(float X, float Y)> keypoints, float markerRatio, Bitmap? baseImage = null) { // Create a graphics object from the image using (Graphics g = Graphics.FromImage(image)) { // Average out width and height of image. // Ignore baseImage portion, it is used by another sample. var averageOfWidthAndHeight = baseImage != null ? baseImage.Width + baseImage.Height : image.Width + image.Height; // Get the marker size from the average dimension value and the marker ratio int markerSize = (int)(averageOfWidthAndHeight * markerRatio / 2); // Create a Red brush for the keypoints and a Blue pen for the connections Brush brush = Brushes.Red; using Pen linePen = new(Color.Blue, markerSize / 2); Next, a list of (int, int) tuples is instantiated that represents each connection. Each tuple has a StartIdx (where the connection starts, like left shoulder) and an EndIdx (where the connection ends, like left elbow). These indexes are always the same based on the output of the pose model and move from top to bottom on the human figure. As a result, you'll notice that indexes 0-4 are skipped, as those indexes represent the face key points, which don't have any connections: // Create a list of index tuples that represents each pose connection, face key points are excluded. List<(int StartIdx, int EndIdx)> connections = [ (5, 6), // Left shoulder to right shoulder (5, 7), // Left shoulder to left elbow (7, 9), // Left elbow to left wrist (6, 8), // Right shoulder to right elbow (8, 10), // Right elbow to right wris (11, 12), // Left hip to right hip (5, 11), // Left shoulder to left hip (6, 12), // Right shoulder to right hip (11, 13), // Left hip to left knee (13, 15), // Left knee to left ankle (12, 14), // Right hip to right knee (14, 16) // Right knee to right ankle ]; Next, for each tuple in that list, a blue line represenating a connection is drawn on the image with DrawLine. It takes in the Pen that was created, along with start and end coordinates from the keypoints list that was passed into the function: // Iterate over connections with a foreach loop foreach (var (startIdx, endIdx) in connections) { // Store keypoint start and end values in tuples var (startPointX, startPointY) = keypoints[startIdx]; var (endPointX, endPointY) = keypoints[endIdx]; // Pass those start and end coordinates, along with the Pen, to DrawLine g.DrawLine(linePen, startPointX, startPointY, endPointX, endPointY); } Next, the exact same thing is done for the red ellipses representing the keypoints. The entire keypoints list is iterated over because every key point gets an indicator regardless of whether or not it was included in a connection. The red ellipses are drawn second as they should be rendered on top of the blue lines representing connections: // Iterate over keypoints with a foreach loop foreach (var (x, y) in keypoints) { // Draw an ellipse using the red brush, the x and y coordinates, and the marker size g.FillEllipse(brush, x - markerSize / 2, y - markerSize / 2, markerSize, markerSize); } Now just return the image: return image; Jumping back over to DetectPose, the last thing left to do is to update the UI with the rendered predictions on the image: // Convert the output to a BitmapImage BitmapImage outputImage = BitmapFunctions.ConvertBitmapToBitmapImage(output); // Enqueue all our UI updates to ensure they don't happen off the UI thread. DispatcherQueue.TryEnqueue(() => { DefaultImage.Source = outputImage; Loader.IsActive = false; Loader.Visibility = Visibility.Collapsed; UploadButton.Visibility = Visibility.Visible; }); That's it! The final output looks like this: Switching to NPU Execution This sample also supports running on the NPU, in addition to the CPU, if you have met the correct device requirements. You will need a Windows with device with a Qualcomm NPU to run NPU samples in the Gallery. The easiest way to check if your device is NPU capable is within the Gallery itself. Using the Select Model dropdown, you can see which execution providers are supported on your device: I'm on a device with a Qualcomm NPU, so the Gallery is giving the option to run the sample on both CPU and NPU. How Gallery Samples Handle Switching Between Execution Providers When the pose is selected with specific hardware accelerator, that information is passed to the InitModel function that handles how the inference session is instantiated. It will specify the Qualcomm QNN execution provider that enables NPU execution. It looks like this: private Task InitModel(string modelPath, HardwareAccelerator hardwareAccelerator) { return Task.Run(() => { // Check if we already have an inference session if (_inferenceSession != null) { return; } // Set up ONNX Runtime (ORT) session options object SessionOptions sessionOptions = new(); sessionOptions.RegisterOrtExtensions(); if (hardwareAccelerator == HardwareAccelerator.QNN) // Check if QNN was passed { // Add the QNN execution provider if so Dictionary<string, string> options = new() { { "backend_path", "QnnHtp.dll" }, { "htp_performance_mode", "high_performance" }, { "htp_graph_finalization_optimization_mode", "3" } }; sessionOptions.AppendExecutionProvider("QNN", options); } // Create a new inference session with these sessionOptions, if CPU is selected, they will be default _inferenceSession = new InferenceSession(modelPath, sessionOptions); }); } With this function, an InferenceSession can be instantiated to fit whatever execution provider is passed in that particular situation and then that InferenceSession can be used throughout the sample. What's Next More in-depth coverage of the other samples in the gallery will be released periodically, covering a range of what is possible with local AI on Windows. Stay tuned for more sample breakdowns coming soon. In the meantime, go check out the AI Dev Gallery to explore more samples and models on Windows. If you run into any problems, feel free to open an issue on the GitHub repository. This project is open-sourced and any feedback to help us improve the Gallery is highly appreciated.