ai
99 TopicsUnderstanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.Why your LLM-powered app needs concurrency
As part of the Python advocacy team, I help maintain several open-source sample AI applications, like our popular RAG chat demo. Through that work, I’ve learned a lot about what makes LLM-powered apps feel fast, reliable, and responsive. One of the most important lessons: use an asynchronous backend framework. Concurrency is critical for LLM apps, which often juggle multiple API calls, database queries, and user requests at the same time. Without async, your app may spend most of its time waiting — blocking one user’s request while another sits idle. The need for concurrency Why? Let’s imagine we’re using a synchronous framework like Flask. We deploy that to a server with gunicorn and several workers. One worker receives a POST request to the "/chat" endpoint, which in turn calls the Azure OpenAI Chat Completions API. That API call can take several seconds to complete — and during that time, the worker is completely tied up, unable to handle any other requests. We could scale out by adding more CPU cores, workers, or threads, but that’s often wasteful and expensive. Without concurrency, each request must be handled serially: When your app relies on long, blocking I/O operations — like model calls, database queries, or external API lookups — a better approach is to use an asynchronous framework. With async I/O, the Python runtime can pause a coroutine that’s waiting for a slow response and switch to handling another incoming request in the meantime. With concurrency, your workers stay busy and can handle new requests while others are waiting: Asynchronous Python backends In the Python ecosystem, there are several asynchronous backend frameworks to choose from: Quart: the asynchronous version of Flask FastAPI: an API-centric, async-only framework (built on Starlette) Litestar: a batteries-included async framework (also built on Starlette) Django: not async by default, but includes support for asynchronous views All of these can be good options depending on your project’s needs. I’ve written more about the decision-making process in another blog post. As an example, let's see what changes when we port a Flask app to a Quart app. First, our handlers now have async in front, signifying that they return a Python coroutine instead of a normal function: async def chat_handler(): request_message = (await request.get_json())["message"] When deploying these apps, I often still use the Gunicorn production web server—but with the Uvicorn worker, which is designed for Python ASGI applications. Alternatively, you can run Uvicorn or Hypercorn directly as standalone servers. Asynchronous API calls To fully benefit from moving to an asynchronous framework, your app’s API calls also need to be asynchronous. That way, whenever a worker is waiting for an external response, it can pause that coroutine and start handling another incoming request. Let's see what that looks like when using the official OpenAI Python SDK. First, we initialize the async version of the OpenAI client: openai_client = openai.AsyncOpenAI( base_url=os.environ["AZURE_OPENAI_ENDPOINT"] + "/openai/v1", api_key=token_provider ) Then, whenever we make API calls with methods on that client, we await their results: chat_coroutine = await openai_client.chat.completions.create( deployment_id=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"], messages=[{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": request_message}], stream=True, ) For the RAG sample, we also have calls to Azure services like Azure AI Search. To make those asynchronous, we first import the async variant of the credential and client classes in the aio module: from azure.identity.aio import DefaultAzureCredential from azure.search.documents.aio import SearchClient Then, like with the OpenAI async clients, we must await results from any methods that make network calls: r = await self.search_client.search(query_text) By ensuring that every outbound network call is asynchronous, your app can make the most of Python’s event loop — handling multiple user sessions and API requests concurrently, without wasting worker time waiting on slow responses. Sample applications We’ve already linked to several of our samples that use async frameworks, but here’s a longer list so you can find the one that best fits your tech stack: Repository App purpose Backend Frontend azure-search-openai-demo RAG with AI Search Python + Quart React rag-postgres-openai-python RAG with PostgreSQL Python + FastAPI React openai-chat-app-quickstart Simple chat with Azure OpenAI models Python + Quart plain JS openai-chat-backend-fastapi Simple chat with Azure OpenAI models Python + FastAPI plain JS deepseek-python Simple chat with Azure AI Foundry models Python + Quart plain JS1.2KViews4likes0CommentsJS AI Build-a-thon Setup in 5 Easy Steps
🔥 TL;DR — You’re 5 Steps Away from an AI Adventure Set up your project repo, follow the quests, build cool stuff, and level up. Everything’s automated, community-backed, and designed to help you actually learn AI — using the skills you already have. Let’s build the future. One quest at a time. 👉 Join the Build-a-thon | Chat on DiscordGetting Started with Azure MCP Server: A Guide for Developers
The world of cloud computing is growing rapidly, and Azure is at the forefront of this innovation. If you're a student developer eager to dive into Azure and learn about Model Context Protocol (MCP), the Azure MCP Server is your perfect starting point. This tool, currently in Public Preview, empowers AI agents to seamlessly interact with Azure services like Azure Storage, Cosmos DB, and more. Let's explore how you can get started! 🎯 Why Use the Azure MCP Server? The Azure MCP Server revolutionizes how AI agents and developers interact with Azure services. Here's a glimpse of what it offers: Exploration Made Easy: List storage accounts, databases, resource groups, tables, and more with natural language commands. Advanced Operations: Manage configurations, query analytics, and execute complex tasks like building Azure applications. Streamlined Integration: From JSON communication to intelligent auto-completion, the Azure MCP Server ensures smooth operations. Whether you're querying log analytics or setting up app configurations, this server simplifies everything. ✨ Installation Guide: One-Click and Manual Methods Prerequisites: Before you begin, ensure the following: Install either the Stable or Insiders release of VS Code. Add the GitHub Copilot and GitHub Copilot Chat extensions. Option 1: One-Click Install You can install the Azure MCP Server in VS Code or VS Code Insiders using NPX. Here's how: Simply run: npx -y /mcp@latest server start Option 2: Manual Install If you'd prefer manual setup, follow these steps: Create a .vscode/mcp.json file in your VS Code project directory. Add the following configuration: { "servers": { "Azure MCP Server": { "command": "npx", "args": ["-y", "@azure/mcp@latest", "server", "start"] } } } Here an example of the settings.json file Now, launch GitHub Copilot in Agent Mode to activate the Azure MCP Server. 🚀 Supercharging Azure Development Once installed, the Azure MCP Server unlocks an array of capabilities: Azure Cosmos DB: List, query, manage databases and containers. Azure Storage: Query blob containers, metadata, and tables. Azure Monitor: Use KQL to analyze logs and monitor your resources. App Configuration: Handle key-value pairs and labeled configurations. Test prompts like: "List my Azure Storage containers" "Query my Log Analytics workspace" "Show my key-value pairs in App Config" These commands let your agents harness the power of Azure services effortlessly. 🛡️ Security & Authentication The Azure MCP Server simplifies authentication using Azure Identity. Your login credentials are handled securely, with support for mechanisms like: Visual Studio credentials Azure CLI login Interactive Browser login For advanced scenarios, enable production credentials with: export AZURE_MCP_INCLUDE_PRODUCTION_CREDENTIALS=true Always perform a security review when integrating MCP servers to ensure compliance with regulations and standards. 🌟 Why Join the Azure MCP Community? As a developer, you're invited to contribute to the Azure MCP Server project. Whether it's fixing bugs, adding features, or enhancing documentation, your contributions are valued. Explore the Contributing Guide for details on getting involved. The Azure MCP Server is your gateway to leveraging Azure services with cutting-edge technology. Dive in, experiment, and bring your projects to life! What Azure project are you excited to build with the MCP Server? Let’s brainstorm ideas together!4.9KViews4likes2CommentsAI Toolkit Extension Pack for Visual Studio Code: Ignite 2025 Update
Unlock the Latest Agentic App Capabilities The Ignite 2025 update delivers a major leap forward for the AI Toolkit extension pack in VS Code, introducing a unified, end-to-end environment for building, visualizing, and deploying agentic applications to Microsoft Foundry, and the addition of Anthropic’s frontier Claude models in the Model Catalog! This release enables developers to build and debug locally in VS Code, then deploy to the cloud with a single click. Seamlessly switch between VS Code and the Foundry portal for visualization, orchestration, and evaluation, creating a smooth roundtrip workflow that accelerates innovation and delivers a truly unified AI development experience. Download the http://aka.ms/aitoolkit today and start building next-generation agentic apps in VS Code! What Can You Do with the AI Toolkit Extension Pack? Access Anthropic models in the Model Catalog Following the Microsoft, NVIDIA and Anthropic strategic partnerships announcement today, we are excited to share that Anthropic’s frontier Claude models including Claude Sonnet 4.5, Claude Opus 4.1, and Claude Haiku 4.5, are now integrated into the AI Toolkit, providing even more choices and flexibility when building intelligent applications and AI agents. Build AI Agents Using GitHub Copilot Scaffold agent applications using best-practice patterns, tool-calling examples, tracing hooks, and test scaffolds, all powered by Copilot and aligned with the Microsoft Agent Framework. Generate agent code in Python or .NET, giving you flexibility to target your preferred runtime. Build and Customize YAML Workflows Design YAML-based workflows in the Foundry portal, then continue editing and testing directly in VS Code. To customize your YAML-based workflows, instantly convert it to Agent Framework code using GitHub Copilot. Upgrade from declarative design to code-first customization without starting from scratch. Visualize Multi-Agent Workflows Envision your code-based agent workflows with an interactive graph visualizer that reveals each component and how they connect Watch in real-time how each node lights up as you run your agent. Use the visualizer to understand and debug complex agent graphs, making iteration fast and intuitive. Experiment, Debug, and Evaluate Locally Use the Hosted Agents Playground to quickly interact with your agents on your development machine. Leverage local tracing support to debug reasoning steps, tool calls, and latency hotspots—so you can quickly diagnose and fix issues. Define metrics, tasks, and datasets for agent evaluation, then implement metrics using the Foundry Evaluation SDK and orchestrate evaluations runs with the help of Copilot. Seamless Integration Across Environments Jump from Foundry Portal to VS Code Web for a development environment in your preferred code editor setting. Open YAML workflows, playgrounds, and agent templates directly in VS Code for editing and deployment. How to Get Started Install the AI Toolkit extension pack from the VS Code marketplace. Check out documentation. Get started with building workflows with Microsoft Foundry in VS Code 1. Work with Hosted (Pro-code) Agent workflows in VS Code 2. Work with Declarative (Low-code) Agent workflows in VS Code Feedback & Support Try out the extensions and let us know what you think! File issues or feedback on our GitHub repo for Foundry extension and AI Toolkit extension. Your input helps us make continuous improvements.1.5KViews3likes0CommentsBuilding an MCP Server for Microsoft Learn
So why Microsoft Learn? Well, it's a treasure trove of knowledge for developers and IT pros. Secondly, because it has search page with many filters, it lends itself well to be wrapped as a MCP server. I'm talking about this page of course Microsoft Learn. > DISCLAIMER, the below article is just to show you how easy you can wrap an API and make that into an MCP Server. There's now an official MCP Server for Docs and Learn which you're encouraged to use over building one for Docs and Learn yourself, see this link to the official server https://github.com/MicrosoftDocs/mcp MCP? Let's back up the tape a little, MCP, what's that? MCP stands for Model Context Protocol and is a new standard for dealing with AI apps. The idea is that you have features like prompts, tools and resources that you can use to build AI applications. Because it's a standard, you can easily share these features with others. In fact, you can use MCP servers running locally as well as ones that runs on an IP Address. Pait that with and Agentic client like Visual Studio Code or Claude Desktop and you have built an Agent. How do we build this? Well, the idea is to "wrap" Microsoft Learn's API for search for training content. This means that we will create an MCP server with tools that we can call. So first off, what parts of the API do we need? Free text search. We definitely want this one as it allows us to search for training content with keywords. Filters. We want to know what filters exist so we can search for content based on them. Topic, there are many different topics like products, roles and more, let's support this too. Building the MCP Server For this, we will use a transport method called SSE, Server-Sent Events. This is a simple way to create a server that people can interact with using HTTP. Here's how the server code will look: from mcp.server.fastmcp import FastMCP from starlette.applications import Starlette from starlette.routing import Mount, Host mcp = FastMCP("Microsoft Learn Search", "0.1.0") @mcp.tool() def learn_filter() -> list[Filter]: pass @mcp.tool() def free_text(query: str) -> list[Result]: pass @mcp.tool() def topic_search(category: str, topic: str) -> list[Result]: pass port = 8000 app = Starlette( routes=[ Mount('/', app=mcp.sse_app()), ] ) This code sets up a basic MCP server using FastMCP and Starlette. The learn_filter, free_text, and topic_search functions are placeholders for the actual implementation of the tools that will interact with the Microsoft Learn API. Note the use of the decorator "@mcp.tool" which registers these functions as tools in the MCP server. Look also at the result types Filter and Result, which are used to define the structure of the data returned by these tools. Implementing the Tools We will implement the tools one by one. Implementing the learn_filter Tool So let's start with the learn_filter function. This function will retrieve the available filters from the Microsoft Learn API. Here's how we can implement it: .tool() def learn_filter() -> list[Filter]: data = search_learn() return data Next, let's implement search_learn like so: # search/search.py import requests from config import BASE_URL from utils.models import Filter def search_learn() -> list[Filter]: """ Top level search function to fetch learn data from the Microsoft Learn API. """ url = to_url() response = requests.get(url) if response.status_code == 200: data = response.json() facets = data["facets"] filters = [] # print(facets.keys()) for key in facets.keys(): # print(f"{key}") for item in facets[key]: filter = Filter(**item) filters.append(filter) # print(f" - {filter.value} ({filter.count})") return filters else: return None def to_url(): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24top=30&showHidden=false&fuzzySearch=false" # config.py BASE_URL = "https://learn.microsoft.com" We also need to define the Filter model, we can use Pydantic for this: # utils/models.py class Filter(BaseModel): type: str value: str count: int How do we know what this looks like, well, we've tried to make a request to the API and looked at the response, here's an excerpt of the response: { "facets": { "roles": [ { "type": "role", "value": "Administrator", "count": 10 }, { "type": "role", "value": "Developer", "count": 20 } ], ... } } Technically, each item has more properties, but we only need the type, value, and count for our purposes. Implementing the free_text Tool For this, we will implement the free_text function, like so: # search/free_text.py import requests from utils.models import Result from config import BASE_URL def search_free_text(text: str) -> list[Result]: url = to_url(text) response = requests.get(url) results = [] if response.status_code == 200: data = response.json() if "results" in data: records = len(data["results"]) print(f"Search results: {records} records found") for item in data["results"]: result = Result(**item) result.url = f"{BASE_URL}{result.url}" results.append(result) return results def to_url(text: str): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&terms={text}&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24top=30&showHidden=false&fuzzySearch=false" Here we use a new type Result, which we also need to define. This will represent the search results returned by the Microsoft Learn API. Let's define it: # utils/models.py from pydantic import BaseModel class Result(BaseModel): title: str url: str popularity: float summary: str | None = None Finally, let's wire this up in our MCP server: .tool() def learn_filter() -> list[Filter]: data = search_learn() return data @mcp.tool() def free_text(query: str) -> list[Result]: print("LOG: free_text called with query:", query) data = search_free_text(query) return data Implementing Search by topic Just one more method to implement, the search_topic method, let's implement it like so: import requests from utils.models import Result from config import BASE_URL def search_topic(topic: str, category: str ="products") -> list[Result]: results = [] url = to_url(category, topic) response = requests.get(url) if response.status_code == 200: data = response.json() for item in data["results"]: result = Result(**item) result.url = f"{BASE_URL}{result.url}" results.append(result) return results else: return [] def to_url(category: str, topic: str): return f"{BASE_URL}/api/contentbrowser/search?environment=prod&locale=en-us&facet=roles&facet=levels&facet=products&facet=subjects&facet=resource_type&%24filter=({category}%2Fany(p%3A%20p%20eq%20%27{topic}%27))&%24top=30&showHidden=false&fuzzySearch=false" Great, this one also uses Result as the return type, so we only have left to wire this up in our MCP server: .tool() def topic_search(category: str, topic: str) -> list[Result]: print("LOG: topic_search called with category:", category, "and topic:", topic) data = search_topic(topic, category) return data That's it, let's move on to testing it. Testing the MCP Server with Visual Studio Code You can test this with Visual Studio Code and its Agent mode. What you need to do is: Create a file mcp.json in the .vscode directory and make sure it looks like this: { "inputs": [], "servers": { "learn": { "type": "sse", "url": "http://localhost:8000/sse" } } } Note how we point to a running server on port 8000, this is where our MCP server will run. Installing Dependencies We will need the following dependencies to run this code: pip install requests "mcp[cli]" You're also recommended to create a virtual environment for this project, you can do this with: python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` Then install the dependencies as shown above. Running the MCP Server To run the MCP server, you can use the following command: uvicorn server:app How this one works is Uvicorn looks for a file server.py and an app variable in that file, which is the Starlette app we created earlier. Try it out We got the server running, and an entry in Visual Studio Code. Click the play icon just above your entry in Visual Studio Code, this should connect to the MCP server. Try it out by typing in the search box in the bottom right of your GitHub Copilot extension. For example, type "Do a free text search on JavaScript, use a tool". It should look something like this: You should see the response coming back from Microsoft Learn and your MCP Server. Summary Congrats, if you've done all the steps so far, you've managed to build an MCP server and solve a very practical problem: how to wrap an API, for a search AND make it "agentic". Learn more Check out the repo MCP CurriculumHow to Use Postgres MCP Server with GitHub Copilot in VS Code
GitHub Copilot has changed how developers write code, but when combined with an MCP (Model Copilot Protocol) server, it also connects your services. With it, Copilot can understand your database schema and generate relevant code for your API, data models, or business logic. In this guide, you'll learn how to use the Neon Serverless Postgres MCP server with GitHub Copilot in VS Code to build a sample REST API quickly. We'll walk through how to create an Azure Function that fetches data from a Neon database, all with minimal setup and no manual query writing. From Code Generation to Database Management with GitHub Copilot AI agents are no longer just helping write code—they’re creating and managing databases. When a chatbot logs a customer conversation, or a new region spins up in the Azure cloud, or a new user signs up, an AI agent can automatically create a database in seconds. No need to open a dashboard or call an API. This is the next evolution of software development: infrastructure as intent. With tools like database MCP servers, agents can now generate real backend services as easily as they generate code. GitHub Copilot becomes your full-stack teammate. It can answer database-related questions, fetch your database connection string, update environment variables in your Azure Function, generate SQL queries to populate tables with mock data, and even help you create new databases or tables on the fly. All directly from your editor, with natural language prompts. Neon has a dedicated MCP server that makes it possible for Copilot to directly understand the structure of your Postgres database. Let's get started with using the Neon MCP server and GitHub Copilot. What You’ll Need Node.js (>= v18.0.0) and npm: Download from nodejs.org. An Azure subscription (create one for free) Install either the stable or Insiders release of VS Code: Stable release Insiders release Install the GitHub Copilot, GitHub Copilot for Azure, and GitHub Copilot Chat extensions for VS Code Azure Functions Core Tools (for local testing) Connect GitHub Copilot to Neon MCP Server Create Neon Database Visit the Neon on Azure Marketplace page and follow the Create a Neon resource guide to deploy Neon on Azure for your subscription. Neon has free plan is more than enough to build proof of concept or kick off a new startup project. Install the Neon MCP Server for VS Code Neon MCP Server offers two options for connecting your VS Code MCP client to Neon. We will use the Remote Hosted MCP Server option. This is the simplest setup—no need to install anything locally or configure a Neon API key in your client. Add the following Neon MCP server configuration to your user settings in VS Code: { "mcp":{ "servers":{ "Neon":{ "command":"npx", "args":[ "-y", "mcp-remote", "https://mcp.neon.tech/sse" ] } } } } Click on Start on the MCP server. A browser window will open with an OAuth prompt. Just follow the steps to give your VS Code MCP client access to your Neon account. Generate an Azure Function REST API using GitHub Copilot Step 1: Create an empty folder (ex: mcp-server-vs-code) and open it in VS Code. Step 2: Open GitHub Copilot Chat in VS Code and switch to Agent mode. You should see the available tools. Step 3: Ask Copilot something like "Create an Azure function with an HTTP trigger”: Copilot will generate a REST API using Azure Functions in JavaScript with a basic setup to run the functions locally. Step 4: Next, you can ask to list existing Neon projects: Step 5: Try out different prompts to fetch the connection string for the chosen database and set it to the Azure Functions settings. Then ask to create a sample Customer table and so on. Or you can even prompt to create a new database branch on Neon. Step 6: Finally, you can prompt Copilot to update the Azure functions code to fetch data from the table: Combine the Azure MCP Server Neon MCP gives GitHub Copilot full access to your database schema, so it can help you write SQL queries, connect to the database, and build API endpoints. But when you add Azure MCP Server into the mix, Copilot can also understand your Azure services—like Blob Storage, Queues, and Azure AI. You can run both Neon MCP and Azure MCP at the same time to create a full cloud-native developer experience. For example: Use Neon MCP for serverless Postgres with branching and instant scale. Use Azure MCP to connect to other Azure services from the same Copilot chat. Even better: Azure MCP is evolving. In newer versions, you can spin up Azure Functions and other services directly from Copilot chat, without ever leaving your editor. Copilot pulls context from both MCP servers, which means smarter code suggestions and faster development. You can mix and match based on your stack and let Copilot help you build real, working apps in minutes. Final Thoughts With GitHub Copilot, Neon MCP server, and Azure Functions, you're no longer writing backend code line by line. It is so fast to build APIs. You're orchestrating services with intent. This is not the future—it’s something you can use today.Enhancing AI Integrations with MCP and Azure API Management
As AI Agents and assistants become increasingly central to modern applications and experiences, the need for seamless, secure integration with external tools and data sources is more critical than ever. The Model Context Protocol (MCP) is emerging as a key open standard enabling these integrations - allowing AI models to interact with APIs, Databases, and other services in a consistent, scalable way. Understanding MCP MCP utilizes a client-host-server architecture built upon JSON-RPC 2.0 for messaging.Communication between clients and servers occurs over defined transport layers, primarily: stdio: Standard input/output, suitable for efficient communication when the client and server run on the same machine. HTTP with Server-Sent Events (SSE): Uses HTTP POST for client-to-server messages and SSE for server-to-client messages, enabling communication over networks, including remote servers. Why MCP Matters While Large Language Models (LLMs) are powerful, their utility is often limited by their inability to access real-time or proprietary data. Traditionally, integrating new data sources or tools required custom connectors/ implementations and significant engineering efforts. MCP addresses this by providing a unified protocol for connecting agents to both local and remote data sources - unifying and streamlining integrations. Leveraging Azure API Management for remote MCP servers Azure API Management is a fully managed platform for publishing, securing, and monitoring APIs. By treating MCP server endpoints as other backend APIs, organizations can apply familiar governance, security, and operational controls. With MCP adoption, the need for robust management of these backend services will intensify. API Management retains a vital role in governing these underlying assets by: Applying security controls to protect the backend resources. Ensuring reliability. Effective monitoring and troubleshooting with tracing requests and context flow. In this blog post, I will walk you through a practical example: hosting an MCP server behind Azure API Management, configuring credential management, and connecting with GitHub Copilot. A Practical Example: Automating Issue Triage To follow along with this scenario, please check out our Model Context Protocol (MCP) lab available at AI-Gateway/labs/model-context-protocol Let's move from theory to practice by exploring how MCP, Azure API Management (APIM) and GitHub Copilot can transform a common engineering workflow. Imagine you're an engineering manager aiming to streamline your team's issue triage process - reducing manual steps and improving efficiency. Example workflow: Engineers log bugs/ feature requests as GitHub issues Following a manual review, a corresponding incident ticket is generated in ServiceNow. This manual handoff is inefficient and error prone. Let's see how we can automate this process - securely connecting GitHub and ServiceNow, enabling an AI Agent (GitHub Copilot in VS Code) to handle triage tasks on your behalf. A significant challenge in this integration involves securely managing delegated access to backend APIs, like GitHub and ServiceNow, from your MCP Server. Azure API Management's credential manager solves this by centralizing secure credential storage and facilitating the secure creation of connections to your third-party backend APIs. Build and deploy your MCP server(s) We'll start by building two MCP servers: GitHub Issues MCP Server Provides tools to authenticate on GitHub (authorize_github), retrieve user infromation (get_user ) and list issues for a specified repository (list_issues). ServiceNow Incidents MCP Server Provides tools to authenticate with ServiceNow (authorize_servicenow), list existing incidents (list_incidents) and create new incidents (create_incident). & ServiceNow Incidents MCP Servers We are using Azure API Management to secure and protect both MCP servers, which are built using Azure Container Apps. Azure API Management's credential manager centralizes secure credential storage and facilitates the secure creation of connections to your backend third-party APIs. Client Auth: You can leverage API Management subscriptions to generate subscription keys, enabling client access to these APIs. Optionally, to further secure /sse and /messages endpoints, we apply the validate-jwt policy to ensure that only clients presenting a valid JWT can access these endpoints, preventing unauthorized access. (see: AI-Gateway/labs/model-context-protocol/src/github/apim-api/auth-client-policy.xml) After registering OAuth applications in GitHub and ServiceNow, we update APIM's credential manager with the respective Client IDs and Client Secrets. This enables APIM to perform OAuth flows on behalf of users, securely storing and managing tokens for backend calls to GitHub and ServiceNow. Connecting your MCP Server in VS Code With your MCP servers deployed and secured behind Azure API Management, the next step is to connect them to your development workflow. Visual Studio Code now supports MCP, enabling GitHub Copilot's agent mode to connect to any MCP-compatible server and extend its capabilities. Open Command Pallette and type in MCP: Add Server ... Select server type as HTTP (HTTP or Server-Sent Events) Paste in the Server URL Provide a Server ID This process automatically updates your settings.json with the MCP server configuration. Once added, GitHub Copilot can connect to your MCP servers and access the defined tools, enabling agentic workflows such as issue triage and automation. You can repeat these steps to add the ServiceNow MCP Server. Understanding Authentication and Authorization with Credential Manager When a user initiates an authentication workflow (e.g, via the authorize_github tool), GitHub Copilot triggers the MCP server to generate an authorization request and a unique login URL. The user is redirected to a consent page, where their registered OAuth application requests permissions to access their GitHub account. Azure API Management acts as a secure intermediary, managing the OAuth flow and token storage. Flow of authorize_github: Step 1 - Connection initiation: GitHub Copilot Agent invokes a sse connection to API Management via the MCP Client (VS Code) Step 2 - Tool Discovery: APIM forwards the request to the GitHub MCP Server, which responds with available tools Step 3 - Authorization Request: GitHub Copilot selects and executes authorize_github tool. The MCP server generates an authorization_id for the chat session. Step 4 - User Consent: If it's the first login, APIM requests a login redirect URL from the MCP Server The MCP Server sends the Login URL to the client, prompting the user to authenticate with GitHub Upon successful login, GitHub redirects the client with an authorization code Step 5 - Token Exchange and Storage: The MCP Client sends the authorization code to API Management APIM exchanges the code for access and refresh tokens from GitHub APIM securely stores the token and creates an Access Control List (ACL) for the service principal. Step 6 - Confirmation: APIM confirms successful authentication to the MCP Client, and the user can now perform authenticated actions, such as accessing private repositories. Check out the python logic for how to implement it: AI-Gateway/labs/model-context-protocol/src/github/mcp-server/mcp-server.py Understanding Tool Calling with underlaying APIs in API Management Using the list_issues tool, Connection confirmed APIM confirms the connection to the MCP Client Issue retrieval: The MCP Client requests issues from the MCP server The MCP Server attaches the authorization_id as a header and forwards the request to APIM The list of issues is returned to the agent You can use the same process to add the ServiceNow MCP Server. With both servers connected, GitHub Copilot Agent can extract issues from a private repo in GitHub and create new incidences in ServiceNow, automating your triage workflow. You can define additional tools such as suggest_assignee tool, assign_engineer tool, update_incident_status tool, notify_engineer tool, request_feedback tool and other to demonstrate a truly closed-loop, automated engineering workflow - from issue creation to resolution and feedback. Take a look at this brief demo showcasing the entire end-to-end process: Summary Azure API Management (APIM) is an essential tool for enterprise customers looking to integrate AI models with external tools using the Model Context Protocol (MCP). In this blog, we demonstrated how Azure API Management's credential manager solves the secure creation of connections to your backend APIs. By integrating MCP servers with VS Code and leveraging APIM for OAuth flows and token management, you can enable secure, agentic automation across your engineering tools. This approach not only streamlines workflows like issues triage and incident creation but also ensures enterprise-grade security and governance for all APIs. Additional Resources Using Credential Manager will help with managing OAuth 2.0 tokens to backend services. Client Auth for remote MCP servers: AZD up: https://aka.ms/mcp-remote-apim-auth AI lab Client Auth: AI-Gateway/labs/mcp-client-authorization/mcp-client-authorization.ipynb Blog Post: https://aka.ms/remote-mcp-apim-auth-blog If you have any questions or would like to learn more about how MCP and Azure API Management can benefit your organization, feel free to reach out to us. We are always here to help and provide further insights. Connect with us on LinkedIn (Julia Kasper & Julia Muiruri) and follow for more updates, insights, and discussions on AI integrations and API management.Real‑Time AI Streaming with Azure OpenAI and SignalR
TL;DR We’ll build a real-time AI app where Azure OpenAI streams responses and SignalR broadcasts them live to an Angular client. Users see answers appear incrementally just like ChatGPT while Azure SignalR Service handles scale. You’ll learn the architecture, streaming code, Angular integration, and optional enhancements like typing indicators and multi-agent scenarios. Why This Matters Modern users expect instant feedback. Waiting for a full AI response feels slow and breaks engagement. Streaming responses: Reduces perceived latency: Users see content as it’s generated. Improves UX: Mimics ChatGPT’s typing effect. Keeps users engaged: Especially for long-form answers. Scales for enterprise: Azure SignalR Service handles thousands of concurrent connections. What you’ll build A SignalR Hub that calls Azure OpenAI with streaming enabled and forwards partial output to clients as it arrives. An Angular client that connects over WebSockets/SSE to the hub and renders partial content with a typing indicator. An optional Azure SignalR Service layer for scalable connection management (thousands to millions of long‑lived connections). References: SignalR hosting & scale; Azure SignalR Service concepts. Architecture The hub calls Azure OpenAI with streaming enabled (await foreach over updates) and broadcasts partials to clients. Azure SignalR Service (optional) offloads connection scale and removes sticky‑session complexity in multi‑node deployments. References: Streaming code pattern; scale/ARR affinity; Azure SignalR integration. Prerequisites Azure OpenAI resource with a deployed model (e.g., gpt-4o or gpt-4o-mini) .NET 8 API + ASP.NET Core SignalR backend Angular 16+ frontend (using microsoft/signalr) Step‑by‑Step Implementation 1) Backend: ASP.NET Core + SignalR Install packages dotnet add package Microsoft.AspNetCore.SignalR dotnet add package Azure.AI.OpenAI --prerelease dotnet add package Azure.Identity dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease # Optional (managed scale): Azure SignalR Service dotnet add package Microsoft.Azure.SignalR Using DefaultAzureCredential (Entra ID) avoids storing raw keys in code and is the recommended auth model for Azure services. Program.cs var builder = WebApplication.CreateBuilder(args); builder.Services.AddSignalR(); // To offload connection management to Azure SignalR Service, uncomment: // builder.Services.AddSignalR().AddAzureSignalR(); builder.Services.AddSingleton<AiStreamingService>(); var app = builder.Build(); app.MapHub<ChatHub>("/chat"); app.Run(); AiStreamingService.cs - streams content from Azure OpenAI using Microsoft.Extensions.AI; using Azure.AI.OpenAI; using Azure.Identity; public class AiStreamingService { private readonly IChatClient _chatClient; public AiStreamingService(IConfiguration config) { var endpoint = new Uri(config["AZURE_OPENAI_ENDPOINT"]!); var deployment = config["AZURE_OPENAI_DEPLOYMENT"]!; // e.g., "gpt-4o-mini" var azureClient = new AzureOpenAIClient(endpoint, new DefaultAzureCredential()); _chatClient = azureClient.GetChatClient(deployment).AsIChatClient(); } public async IAsyncEnumerable<string> StreamReplyAsync(string userMessage) { var messages = new List<ChatMessage> { ChatMessage.CreateSystemMessage("You are a helpful assistant."), ChatMessage.CreateUserMessage(userMessage) }; await foreach (var update in _chatClient.CompleteChatStreamingAsync(messages)) { // Only text parts; ignore tool calls/annotations var chunk = string.Join("", update.Content .Where(p => p.Kind == ChatMessageContentPartKind.Text) .Select(p => ((TextContent)p).Text)); if (!string.IsNullOrEmpty(chunk)) yield return chunk; } } } Modern .NET AI extensions (Microsoft.Extensions.AI) expose a unified streaming pattern via CompleteChatStreamingAsync. ChatHub.cs - pushes partials to the caller using Microsoft.AspNetCore.SignalR; public class ChatHub : Hub { private readonly AiStreamingService _ai; public ChatHub(AiStreamingService ai) => _ai = ai; // Client calls: connection.invoke("AskAi", prompt) public async Task AskAi(string prompt) { var messageId = Guid.NewGuid().ToString("N"); await Clients.Caller.SendAsync("typing", messageId, true); await foreach (var partial in _ai.StreamReplyAsync(prompt)) { await Clients.Caller.SendAsync("partial", messageId, partial); } await Clients.Caller.SendAsync("typing", messageId, false); await Clients.Caller.SendAsync("completed", messageId); } } 2) Frontend: Angular client with microsoft/signalr Install the SignalR client npm i microsoft/signalr Create a SignalR service (Angular) // src/app/services/ai-stream.service.ts import { Injectable } from '@angular/core'; import * as signalR from '@microsoft/signalr'; import { BehaviorSubject, Observable } from 'rxjs'; @Injectable({ providedIn: 'root' }) export class AiStreamService { private connection?: signalR.HubConnection; private typing$ = new BehaviorSubject<boolean>(false); private partial$ = new BehaviorSubject<string>(''); private completed$ = new BehaviorSubject<boolean>(false); get typing(): Observable<boolean> { return this.typing$.asObservable(); } get partial(): Observable<string> { return this.partial$.asObservable(); } get completed(): Observable<boolean> { return this.completed$.asObservable(); } async start(): Promise<void> { this.connection = new signalR.HubConnectionBuilder() .withUrl('/chat') // same origin; use absolute URL if CORS .withAutomaticReconnect() .configureLogging(signalR.LogLevel.Information) .build(); this.connection.on('typing', (_id: string, on: boolean) => this.typing$.next(on)); this.connection.on('partial', (_id: string, text: string) => { // Append incremental content this.partial$.next((this.partial$.value || '') + text); }); this.connection.on('completed', (_id: string) => this.completed$.next(true)); await this.connection.start(); } async ask(prompt: string): Promise<void> { // Reset state per request this.partial$.next(''); this.completed$.next(false); await this.connection?.invoke('AskAi', prompt); } } Angular component // src/app/components/ai-chat/ai-chat.component.ts import { Component, OnInit } from '@angular/core'; import { AiStreamService } from '../../services/ai-stream.service'; @Component({ selector: 'app-ai-chat', templateUrl: './ai-chat.component.html', styleUrls: ['./ai-chat.component.css'] }) export class AiChatComponent implements OnInit { prompt = ''; output = ''; typing = false; done = false; constructor(private ai: AiStreamService) {} async ngOnInit() { await this.ai.start(); this.ai.typing.subscribe(on => this.typing = on); this.ai.partial.subscribe(text => this.output = text); this.ai.completed.subscribe(done => this.done = done); } async send() { this.output = ''; this.done = false; await this.ai.ask(this.prompt); } } HTML Template <!-- src/app/components/ai-chat/ai-chat.component.html --> <div class="chat"> <div class="prompt"> <input [(ngModel)]="prompt" placeholder="Ask me anything…" /> <button (click)="send()">Send</button> </div> <div class="response"> <pre>{{ output }}</pre> <div class="typing" *ngIf="typing">Assistant is typing…</div> <div class="done" *ngIf="done">✓ Completed</div> </div> </div> Streaming modes, content filters, and UX Azure OpenAI streaming interacts with content filtering in two ways: Default streaming: The service buffers output into content chunks and runs content filters before each chunk is emitted; you still stream, but not necessarily token‑by‑token. Asynchronous Filter (optional): The service returns token‑level updates immediately and runs filters asynchronously. You get ultra‑smooth streaming but must handle delayed moderation signals (e.g., redaction or halting the stream). Best practices Append partials in small batches client‑side to avoid DOM thrash; finalize formatting on "completed". Log full messages server‑side only after completion to keep histories consistent (mirrors agent frameworks). Security & compliance Auth: Prefer Microsoft Entra ID (DefaultAzureCredential) to avoid key sprawl; use RBAC and Managed Identities where possible. Secrets: Store Azure SignalR connection strings in Key Vault and rotate periodically; never hardcode. CORS & cross‑domain: When hosting frontend and hub on different origins, configure CORS and use absolute URLs in withUrl(...). Connection management & scaling tips Persistent connection load: SignalR consumes TCP resources; separate heavy real‑time workloads or use Azure SignalR to protect other apps. Sticky sessions (self‑hosted): Required in most multi‑server scenarios unless WebSockets‑only + SkipNegotiation applies; Azure SignalR removes this requirement. Learn more AI‑Powered Group Chat sample (ASP.NET Core): Azure OpenAI .NET client (auth & streaming): SignalR JavaScript ClientDemystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: “No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.” Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilot’s underlying models are trained on sources that include public code, it might simply “copy and paste” code from those sources. Let’s clarify how this actually works: Does GitHub Copilot “copy/paste”? “The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.” To provide an additional layer of protection, GitHub Copilot includes a “duplicate detection filter”. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilot’s data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless they’re excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approve” setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isn’t available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IEC 27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISO 27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, it’s clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.