python
52 TopicsUsing on-behalf-of flow for Entra-based MCP servers
In December, we presented a series about MCP, culminating in a session about adding authentication to MCP servers. I demoed a Python MCP server that uses Microsoft Entra for authentication, requiring users to first login to the Microsoft tenant before they could use a tool. Many developers asked how they could take the Entra integration further, like to check the user's group membership or query their OneDrive. That requires using an "on-behalf-of" flow, where the MCP server uses the user's Entra identity to call another API, like the Microsoft Graph API. In this blog post, I will explain how to use Entra with OBO flow in a Python FastMCP server. How MCP servers can use Entra authentication The MCP authorization specification is based on OAuth2, with some additional features tacked on top. Every MCP client is actually an OAuth2 client, and each MCP server is an OAuth2 resource server: MCP auth adds these features to help clients determine how to authorize a server: Protected resource metadata (PRM): Implemented on the MCP server, provides details about the authorization server and method Authorization server metadata: Implemented on the authorization server, gives URLs for OAuth2 endpoints Additionally, to allow MCP servers to work with arbitrary MCP clients, MCP auth supports either of these client registration methods: Dynamic Client Registration (DCR): Implemented on the authorization server, it can register new MCP clients as OAuth2 clients, even if it hasn't seen them before. Client ID Metadata Documents (CIMD): An alternative to DCR, this requires both the MCP client to make a CIMD document available on a server, and requires the authorization server to fetch the CIMD document for details about the client. Microsoft Entra does support authorization server metadata, but it does not support either DCR or CIMD. That's actually fine if you are building an MCP server that's only going to be used with pre-authorized clients, like if the server will only be used with VS Code or with a specific internal MCP client. But, if you are building an MCP server that can be used with arbitrary MCP clients, then either DCR or CIMD is required. So what do we do? Fortunately, the FastMCP SDK implements DCR on top of Entra using an OAuth proxy pattern. FastMCP acts as the authorization server, intercepting requests and forwarding to Entra when needed, and storing OAuth client information in a designated database (like in-memory or Cosmos DB). Let's walk through the steps to set that up. Registering the server with Entra Before the server can use Entra to authorize users, we need to register the server with Entra via an app registration. We can do registration using the Azure Portal, Azure CLI, Microsoft Graph SDK, or even Bicep. In this demo, I use the Python MS Graph SDK as it allows me to specify everything programmatically. First, I create the Entra app registration, specifying the sign-in audience (single-tenant), redirect URIs (including local MCP server, deployed MCP server, and VS Code redirect URIs), and the scopes for the exposed API. request_app = Application( display_name="FastMCP Server App", sign_in_audience="AzureADMyOrg", # Single tenant web=WebApplication( redirect_uris=[ "http://localhost:8000/auth/callback", "https://vscode.dev/redirect", "http://127.0.0.1:33418", "https://deployedurl.com/auth/callback" ], ), api=ApiApplication( oauth2_permission_scopes=[ PermissionScope( id=uuid.UUID("{" + str(uuid.uuid4()) + "}"), admin_consent_display_name="Access FastMCP Server", admin_consent_description="Allows access to the FastMCP server as the signed-in user.", user_consent_display_name="Access FastMCP Server", user_consent_description="Allow access to the FastMCP server on your behalf", is_enabled=True, value="mcp-access", type="User", )], requested_access_token_version=2, # Required by FastMCP ) ) app = await graph_client.applications.post(request_app) await graph_client.applications.by_application_id(app.id).patch( Application(identifier_uris=[f"api://{app.app_id}"])) Thanks to that configuration, when an MCP client like VS Code requests an OAuth2 token, it will request a token with the scope "api://{app.app_id}/mcp-access", and the FastMCP server will validate that incoming tokens contain that scope. Next, I create a Service Principal for that Entra app registration, which represents the Entra app in my tenant request_principal = ServicePrincipal(app_id=app.app_id, display_name=app.display_name) await graph_client.service_principals.post(request_principal) I need a way for the server to prove that it can use that Entra app registration, so I register a secret: password_credential = await graph_client.applications.by_application_id(app.id).add_password.post( AddPasswordPostRequestBody( password_credential=PasswordCredential(display_name="FastMCPSecret"))) Ideally, I would like to move away from secrets, as Entra now has support for using federated identity credentials for Entra app registrations instead, but that form of credential isn't supported yet in the FastMCP SDK. If you choose to use a secret, make sure that you store the secret securely. Granting admin consent This next step is only necessary when our MCP server wants to use an OBO flow to exchange access tokens for other resource server tokens (Graph API tokens, in this case). For the OBO flow to work, the Entra app registration needs permission to call the Graph API on behalf of users. If we controlled the client, we could force it to request the required scopes as part of the initial login dialog. However, since we are configuring this server to work with arbitrary MCP clients, we don't have that option. Instead, we grant admin consent to the Entra app for the necessary scopes, such that no Graph API consent dialog is needed. This code grants admin consent to the associated service principal for the Graph API resource and scopes: server_principal = await graph_client.service_principals_with_app_id(app.app_id).get() grant = GrantDefinition( principal_id=server_principal.id, resource_app_id="00000003-0000-0000-c000-000000000000", # Graph API scopes=["User.Read", "email", "offline_access", "openid", "profile"], target_label="server application") resource_principal = await graph_client.service_principals_with_app_id(grant.resource_app_id).get() desired_scope = grant.scope_string() await graph_client.oauth2_permission_grants.post( OAuth2PermissionGrant( client_id=grant.principal_id, consent_type="AllPrincipals", resource_id=resource_principal.id, scope=desired_scope)) If our MCP server needed to use an OBO flow with another resource server, we could request additional grants for those resources and scopes. Our Entra app registration is now ready for the MCP server, so let's move on to see the server code. Using FastMCP servers with Entra In our MCP server code, we configure FastMCP's built in AzureProvider based off the details from the Entra app registration process: auth = AzureProvider( client_id=os.environ["ENTRA_PROXY_AZURE_CLIENT_ID"], client_secret=os.environ["ENTRA_PROXY_AZURE_CLIENT_SECRET"], tenant_id=os.environ["AZURE_TENANT_ID"], base_url=entra_base_url, # MCP server URL required_scopes=["mcp-access"], client_storage=oauth_client_store, # in-memory or Cosmos DB ) To make it easy for our MCP tools to access an identifier for the currently logged in user, we define a middleware that inspects the claims of the current token using FastMCP's get_access_token() and sets the "oid" (Entra object identifier) in the state: class UserAuthMiddleware(Middleware): def _get_user_id(self): token = get_access_token() if not (token and hasattr(token, "claims")): return None return token.claims.get("oid") async def on_call_tool(self, context: MiddlewareContext, call_next): user_id = self._get_user_id() if context.fastmcp_context is not None: context.fastmcp_context.set_state("user_id", user_id) return await call_next(context) async def on_read_resource(self, context: MiddlewareContext, call_next): user_id = self._get_user_id() if context.fastmcp_context is not None: context.fastmcp_context.set_state("user_id", user_id) return await call_next(context) When we initialize the FastMCP server, we set the auth provider and include that middleware: mcp = FastMCP("Expenses Tracker", auth=auth, middleware=[UserAuthMiddleware()]) Now, every request made to the MCP server will require authentication. The server will return a 401 if a valid token isn't provided, and that 401 will prompt the MCP client to kick off the MCP authorization flow. Inside each tool, we can grab the user id from the state, and use that to customize the response for the user, like to store or query items in a database. .tool async def add_user_expense( date: Annotated[date, "Date of the expense in YYYY-MM-DD format"], amount: Annotated[float, "Positive numeric amount of the expense"], description: Annotated[str, "Human-readable description of the expense"], ctx: Context, ): """Add a new expense to Cosmos DB.""" user_id = ctx.get_state("user_id") if not user_id: return "Error: Authentication required (no user_id present)" expense_item = { "id": str(uuid.uuid4()), "user_id": user_id, "date": date.isoformat(), "amount": amount, "description": description } await cosmos_container.create_item(body=expense_item) Using OBO flow in FastMCP server Now we can move on to using an OBO flow inside an MCP tool, to access the Graph API on behalf of the user. To make it easy to exchange Entra tokens for Graph tokens, we use the Python MSAL SDK, configuring a ConfidentialClientApplication based on our Entra app registration details: confidential_client = ConfidentialClientApplication( client_id=os.environ["ENTRA_PROXY_AZURE_CLIENT_ID"], client_credential=os.environ["ENTRA_PROXY_AZURE_CLIENT_SECRET"], authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}", token_cache=TokenCache(), ) Inside the tool that requires OBO, we ask MSAL to exchange the MCP access token for a Graph API access token: access_token = get_access_token() graph_resource_access_token = confidential_client.acquire_token_on_behalf_of( user_assertion=access_token.token, scopes=["https://graph.microsoft.com/.default"] ) graph_token = graph_resource_access_token["access_token"] Once we successfully acquire the token, we can use that token with the Graph API, for any operations permitted by the scopes in the admin consent granted earlier. For this example, we call the Graph API to check whether the logged in user is a member of a particular Entra group, and restrict tool usage if not: async with httpx.AsyncClient() as client: url = ("https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group" f"?$filter=id eq '{group_id}'&$count=true") response = await client.get( url, headers={ "Authorization": f"Bearer {graph_token}", "ConsistencyLevel": "eventual", }) data = response.json() membership_count = data.get("@odata.count", 0) You could imagine many other ways to use an OBO flow however, like to query for more details from the Graph API, upload documents to OneDrive/SharePoint/Notes, send emails, and more! All together now For the full code, check out the open source python-mcp-demos repository, and follow the deployment steps for Entra. The most relevant code files are: auth_init.py: Creates the Entra app registration, service principal, client secret, and grants admin consent for OBO flow. auth_update.py: Updates the app registration's redirect URIs after deployment, adding the deployed server URL. auth_entra_mcp.py: The MCP server itself, configured with FastMCP's AzureProvider and tools that use OBO for group membership checks. As always, please let us know if you have further questions or ideas for other Entra integrations. Acknowledgements: Thank you to Matt Gotteiner for his guidance in implementing the OBO flow and review of the blog post.Join our free livestream series on building agents in Python
Join us for a new 6âpart livestream series where we explore the foundational concepts behind building AI agents in Python using the Microsoft Agent Framework. This series is for anyone who wants to understand how agents workâhow they call tools, use memory and context, and construct workflows on top of them. Over two weeks, weâll dive into the practical building blocks that shape real agent behavior. Youâll learn how to: ð§ Register and structure tools ð Connect local MCP servers ð Add context with database calls ð§ Add memory for personalization ð Monitor agent behavior with OpenTelemetry â Evaluate the quality of agent output Throughout the series, weâll use Python for all live examples and share full code so you can run everything yourself. You can also follow along live using GitHub Models and GitHub Codespaces. ð Register for the full series. Spanish speaker? ¡Tendremos una serie para hispanohablantes! RegÃstrese aquà In addition to the live streams, you can also join Join the Microsoft Foundry Discord to ask follow-up questions after each stream. If you are brand new to generative AI with Python, start with our 9-part Python + AI series, which covers topics such as LLMs, embedding models, RAG, tool calling, MCP, and will prepare you perfectly for the agents series. To learn more about each live stream or register for individual sessions, scroll down: Python + Agents: Building your first agent in Python 24 February, 2026 | 6:30 PM - 7:30 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the first session of our Python + Agents series, weâll kick things off with the fundamentals: what AI agents are, how they work, and how to build your first one using the Microsoft Agent Framework. Weâll start with the core anatomy of an agent, then walk through how tool calling works in practiceâbeginning with a single tool, expanding to multiple tools, and finally connecting to tools exposed through local MCP servers. Weâll conclude with the supervisor agent pattern, where a single supervisor agent coordinates subtasks across multiple subagents, by treating each agent as a tool. Along the way, we'll share tips for debugging and inspecting agents, like using the DevUI interface from Microsoft Agent Framework for interacting with agent prototypes. Python + Agents: Adding context and memory to agents 25 February, 2026 | 6:30 PM - 7:30 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the second session of our Python + Agents series, weâll extend agents built with the Microsoft Agent Framework by adding two essential capabilities: context and memory. Weâll begin with context, commonly known as RetrievalâAugmented Generation (RAG), and show how agents can ground their responses using knowledge retrieved from local data sources such as SQLite or PostgreSQL. This enables agents to provide accurate, domainâspecific answers based on real information rather than model hallucination. Next, weâll explore memoryâboth shortâterm, threadâlevel context and longâterm, persistent memory. Youâll see how agents can store and recall information using solutions like Redis or openâsource libraries such as Mem0, enabling them to remember previous interactions, user preferences, and evolving tasks across sessions. By the end, youâll understand how to build agents that are not only capable but contextâaware and memoryâefficient, resulting in richer, more personalized user experiences. Python + Agents: Monitoring and evaluating agents 26 February, 2026 | 6:30 PM - 7:30 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the third session of our Python + Agents series, weâll focus on two essential components of building reliable agents: observability and evaluation. Weâll begin with observability, using OpenTelemetry to capture traces, metrics, and logs from agent actions. You'll learn how to instrument your agents and use a local Aspire dashboard to identify slowdowns and failures. From there, weâll explore how to evaluate agent behavior using the Azure AI Evaluation SDK. Youâll see how to define evaluation criteria, run automated assessments over a set of tasks, and analyze the results to measure accuracy, helpfulness, and task success. By the end of the session, youâll have practical tools and workflows for monitoring, measuring, and improving your agentsâso theyâre not just functional, but dependable and verifiably effective. Python + Agents: Building your first AI-driven workflows 3 March, 2026 | 6:30 PM - 7:30 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In Session 4 of our Python + Agents series, weâll explore the foundations of building AIâdriven workflows using the Microsoft Agent Framework: defining workflow steps, connecting them, passing data between them, and introducing simple ways to guide the path a workflow takes. Weâll begin with a conceptual overview of workflows and walk through their core components: executors, edges, and events. Youâll learn how workflows can be composed of simple Python functions or powered by full AI agents when a step requires modelâdriven behavior. From there, weâll dig into conditional branching, showing how workflows can follow different paths depending on model outputs, intermediate results, or lightweight decision functions. Weâll introduce structured outputs as a way to make branching more reliable and easier to maintainâavoiding vague string checks and ensuring that workflow decisions are based on clear, typed data. We'll discover how the DevUI interface makes it easier to develop workflows by visualizing the workflow graph and surfacing the streaming events during a workflow's execution. Finally, we'll dive into an E2E demo application that uses workflows inside a user-facing application with a frontend and backend. Python + Agents: Orchestrating advanced multi-agent workflows 4 March, 2026 | 6:30 PM - 7:30 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In Session 5 of our Python + Agents series, weâll go beyond workflow fundamentals and explore how to orchestrate advanced, multiâagent workflows using the Microsoft Agent Framework. This session focuses on patterns that coordinate multiple steps or multiple agents at once, enabling more powerful and flexible AIâdriven systems. Weâll begin by comparing sequential vs. concurrent execution, then dive into techniques for running workflow steps in parallel. Youâll learn how fanâout and fanâin edges enable multiple branches to run at the same time, how to aggregate their results, and how concurrency allows workflows to scale across tasks efficiently. From there, weâll introduce two multiâagent orchestration approaches that are built into the framework. Weâll start with handoff, where control moves entirely from one agent to another based on workflow logic, which is useful for routing tasks to the right agent as the workflow progresses. Weâll then look at Magentic, a planningâoriented supervisor that generates a highâlevel plan for completing a task and delegates portions of that plan to other agents. Finally, we'll wrap up with a demo of an E2E application that showcases a concurrent multi-agent workflow in action. Python + Agents: Adding a human in the loop to agentic workflows 5 March, 2026 | 6:30 PM - 7:30 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the final session of our Python + Agents series, weâll explore how to incorporate humanâinâtheâloop (HITL) interactions into agentic workflows using the Microsoft Agent Framework. This session focuses on adding points where a workflow can pause, request input or approval from a user, and then resume once the human has responded. HITL is especially important because LLMs can produce uncertain or inconsistent outputs, and human checkpoints provide an added layer of accuracy and oversight. Weâll begin with the frameworkâs requestsâandâresponses model, which provides a structured way for workflows to ask questions, collect human input, and continue execution with that data. We'll move onto tool approval, one of the most frequent reasons an agent requests input from a human, and see how workflows can surface pending tool calls for approval or rejection. Next, weâll cover checkpoints and resuming, which allow workflows to pause and be restarted later. This is especially important for HITL scenarios where the human may not be available immediately. Weâll walk through examples that demonstrate how checkpoints store progress, how resuming picks up the workflow state, and how this mechanism supports longerârunning or multiâstep review cycles. This session brings together everything from the seriesâagents, workflows, branching, orchestrationâand shows how to integrate humans thoughtfully into AIâdriven processes, especially when reliability and judgment matter most.Data Security: Azure key Vault in Data bricks
Why this article? To remove the vulnerability of exposing the data base connection string in Databricks notebook directly, by using Azure key vault. Database connection strings are extremely confidential/vulnerable data, that we should not be exposed in the DataBricks notebook explicitly. Azure key vault is a secure option to read the secrets and establish connection. What do we need? Tenant Id of the app from the app registration with access to the azure key vault secrets Client Id of the of the app from the app registration with access to the azure key vault secrets Client secret of the app from the app registration with access to the azure key vault Where to find this information? Under the App registration, you can find the (application) Client Id, Directory (tenant) Id. Client secret value is found in the app registration of the service, under Manage -> Certificate & secrets. You can use an existing secret or create a new one and use it to access the key Vault secrets. Make sure the application is added with get access to read the secrets. Verify the key vault you are checking and using in Databricks is the same one with read access. You can verify this by going to the Azure key vault -> Access Policies and search for the application name. It should show up on search as below, this will confirm that the access of the application. What do we need to setup in Databricks notebook? Open your cluster and install azure.keyvault and azure-identity (installing version should be compatible with you cluster configuration, refer: https://docs.databricks.com/aws/en/libraries/package-repositories) In a new notebook, letâs start by importing the necessary modules. Your notebook would start with the modules, followed by tentatId, clientId, client secret, azure key vault URL , secretName of the connection string in the azure key vault and secretVersion. Lastly, we need to fetch the secret using the below code Vola, we have the DB connection string to perform the CRUD operations. Conclusion: By securely retrieving your database connection string from Azure Key Vault, you eliminate credential exposure and strengthen the overall security posture of your Databricks workflows. This simple shift ensures your notebooks remain clean, compliant, and productionâready.Learn how to build MCP servers with Python and Azure
We just concluded Python + MCP, a three-part livestream series where we: Built MCP servers in Python using FastMCP Deployed them into production on Azure (Container Apps and Functions) Added authentication, including Microsoft Entra as the OAuth provider All of the materials from our series are available for you to keep learning from, and linked below: Video recordings of each stream Powerpoint slides Open-source code samples complete with Azure infrastructure and 1-command deployment If you're an instructor, feel free to use the slides and code examples in your own classes. Spanish speaker? We've got you covered- check out the Spanish version of the series. ððœââïžHave follow up questions? Join our weekly office hours on Foundry Discord: Tuesdays @ 11AM PT â Python + AI Thursdays @ 8:30 AM PT â All things MCP Building MCP servers with FastMCP ðº Watch YouTube recording In the intro session of our Python + MCP series, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally. Then we consume that server from chatbots like GitHub Copilot in VS Code, using it's tools, resources, and prompts. Finally, we discover how easy it is to connect AI agent frameworks like Langchain and Microsoft agent-framework to the MCP server. Slides for this session Code repository with examples: python-mcp-demos Deploying MCP servers to the cloud ðº Watch YouTube recording In our second session of the Python + MCP series, we deploy MCP servers to the cloud! We walk through the process of containerizing a FastMCP server with Docker and deploying to Azure Container Apps. Then we instrument the MCP server with OpenTelemetry and observe the tool calls using Azure Application Insights and Logfire. Finally, we explore private networking options for MCP servers, using virtual networks that restrict external access to internal MCP tools and agents. Slides for this session Code repository with examples: python-mcp-demos Authentication for MCP servers ðº Watch YouTube recording In our third session of the Python + MCP series, we explore the best ways to build authentication layers on top of your MCP servers. We start off simple, with an API key to gate access, and demonstrate a key-restricted FastMCP server deployed to Azure Functions. Then we move on to OAuth-based authentication for MCP servers that provide user-specific data. We dive deep into MCP authentication, which is built on top of OAuth2 but with additional requirements like PRM and DCR/CIMD, which can make it difficult to implement fully. We demonstrate the full MCP auth flow in the open-souce identity provider KeyCloak, and show how to use an OAuth proxy pattern to implement MCP auth on top of Microsoft Entra. Slides for this session Code repository with Container Apps examples: python-mcp-demos Code repository with Functions examples: python-mcp-demos8.8KViews3likes2CommentsFrom Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, weâve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint itâs now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isnât just about running models locally, itâs about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that donât break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What Youâll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration ð 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview ⢠Industry Applications ⢠SLM Introduction ⢠Learning Objectives Beginner 1-2 hrs ð 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals ⢠Real World Case Studies ⢠Implementation Guide ⢠Edge Deployment Beginner 3-4 hrs ð§ 02 SLM Model Foundations Model families & architecture Phi Family ⢠Qwen Family ⢠Gemma Family ⢠BitNET ⢠ΌModel ⢠Phi-Silica Beginner 4-5 hrs ð 03 SLM Deployment Practice Local & cloud deployment Advanced Learning ⢠Local Environment ⢠Cloud Deployment Intermediate 4-5 hrs âïž 04 Model Optimization Toolkit Cross-platform optimization Introduction ⢠Llama.cpp ⢠Microsoft Olive ⢠OpenVINO ⢠Apple MLX ⢠Workflow Synthesis Intermediate 5-6 hrs ð§ 05 SLMOps Production Production operations SLMOps Introduction ⢠Model Distillation ⢠Fine-tuning ⢠Production Deployment Advanced 5-6 hrs ð€ 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction ⢠Function Calling ⢠Model Context Protocol Advanced 4-5 hrs ð» 07 Platform Implementation Cross-platform samples AI Toolkit ⢠Foundry Local ⢠Windows Development Advanced 3-4 hrs ð 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - ð§ Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - ð§© ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - ð® DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - ð¥ïž Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isnât just about inference, itâs about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.Learn MCP from our free livestream series in December
Our Python + MCP series is a three-part, hands-on journey into one of the most important emerging technologies of 2025: MCP (Model Context Protocol) â an open standard for extending AI agents and chat interfaces with real-world tools, data, and execution environments. Whether you're building custom GitHub Copilot tools, powering internal developer agents, or creating AI-augmented applications, MCP provides the missing interoperability layer between LLMs and the systems they need to act on. Across the series, we move from local prototyping, to cloud deployment, to enterprise-grade authentication and security, all powered by Python and the FastMCP SDK. Each session builds on the last, showing how MCP servers can evolve from simple localhost services to fully authenticated, production-ready services running in the cloud â and how agents built with frameworks like Langchain and Microsoftâs agent-framework can consume them at every stage. ð Register for the entire series. You can also scroll down to learn about each live stream and register for individual sessions. In addition to the live streams, you can also join office hours after each session in Foundry Discord to ask any follow-up questions. To get started with your MCP learnings before the series, check out the free MCP-for-beginners course on GitHub. Building MCP servers with FastMCP 16 December, 2025 | 6:00 PM - 7:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the intro session of our Python + MCP series, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we build our own MCP client to consume the server. Finally, we discover how easy it is to connect AI agent frameworks like Langchain and Microsoft agent-framework to MCP servers. Deploying MCP servers to the cloud 17 December, 2025 | 6:00 PM - 7:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our second session of the Python + MCP series, we're deploying MCP servers to the cloud! We'll walk through the process of containerizing a FastMCP server with Docker and deploying to Azure Container Apps, and also demonstrate a FastMCP server running directly on Azure Functions. Then we'll explore private networking options for MCP servers, using virtual networks that restrict external access to internal MCP tools and agents. Authentication for MCP servers 18 December, 2025 | 6:00 PM - 7:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our third session of the Python + MCP series, we're exploring the best ways to build authentication layers on top of your MCP servers. That could be as simple as an API key to gate access, but for the servers that provide user-specific data, we need to use an OAuth2-based authentication flow. MCP authentication is built on top of OAuth2 but with additional requirements like PRM and DCR/CIMD, which can make it difficult to implement fully. In this session, we'll demonstrate the full MCP auth flow, and provide examples that implement MCP Auth on top of Microsoft Entra.OnâDevice AI with Windows AI Foundry and Foundry Local
From âwaitingâ to âinstantâ- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isnât - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt OnâDevice AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacyâfirst by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, highâvolume tasks. This approach is especially useful in regulated industries, fieldâwork tools, and any app where users expect quick, onâdevice responses. Hybrid Pattern for Real Apps On-device AI doesnât replace the cloud, it complements it. Hereâs how: Standalone OnâDevice: Quick, private actions like document summarization, local search, and offline assistants. CloudâEnhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example using Foundry Local: 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isnât just a trend - itâs a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether youâre creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/Durable Task Extension for Microsoft Agent Framework ã§ãå ç¢ãªãšãŒãžã§ã³ããæ§ç¯ãã
ïŒãã㯠2025/11/13 ã«åºããã補åããŒã ã®èšäºãBulletproof agents with the durable task extension for Microsoft Agent Frameworkããæ¥æ¬èªã«ç¿»èš³ãããã®ã§ããïŒ æ¬æ¥ (2025/11/13)ãDurable Task Extension for Microsoft Agent Framework ã®ãããªãã¯ãã¬ãã¥ãŒãçºè¡šã§ããããšã倧å€ããããæããŸãã ãã®æ¡åŒµæ©èœã¯ãAzure Durable Functions ã® å®çžŸãã èä¹ æ§ã®ããå®è¡ (durable execution) (ã¯ã©ãã·ã¥ãåèµ·åã«èãã) ãšåæ£å®è¡ (è€æ°ã€ã³ã¹ã¿ã³ã¹ã§åäœãã) æ©èœããMicrosoft Agent Framework ã«çŽæ¥çµã¿èŸŒãããšã§ãæ¬çªç°å¢å¯Ÿå¿ã®ãå ç¢ã§ã¹ã±ãŒã©ãã«ãª AI ãšãŒãžã§ã³ãã®æ§ç¯æ¹æ³ãäžæ°ããŸãã ããã«ãããã»ãã·ã§ã³ç®¡çãé害埩æ§ãã¹ã±ãŒãªã³ã°ãèªåçã«åŠçãããã¹ããŒããã«ã§å ç¢ãª AI ãšãŒãžã§ã³ãã Azure ã«ãããã€ã§ããéçºè ã¯ãšãŒãžã§ã³ãã®ããžãã¯ã«å®å šã«éäžã§ããããã«ãªããŸãã ããšãã°ãè€æ°æ¥ã«ãããäŒè©±ã§ã³ã³ããã¹ããç¶æããã«ã¹ã¿ããŒãµãŒãã¹ãšãŒãžã§ã³ãã人éã«ããæ¿èª (human-in-the-loop approval workflow) ãå«ãã³ã³ãã³ããã€ãã©ã€ã³ããŸãã¯å°éç㪠AI ã¢ãã«ã飿ºãããå®å šèªååããããã«ããšãŒãžã§ã³ãã·ã¹ãã ãæ§ç¯ããå Žåã§ãããã® Durable Task Extension for Microsoft Agent Framework ã¯ããµãŒããŒã¬ã¹ã®ã·ã³ãã«ãã§æ¬çªã¬ãã«ã®ä¿¡é Œæ§ãã¹ã±ãŒã©ããªãã£ããããŠèª¿æŽæ©èœãæäŸããŸãã Durable Task Extension ã®äž»ãªæ©èœïŒ ãµãŒããŒã¬ã¹ãã¹ãã£ã³ã° (Serverless Hosting)ïŒAzure Functions äžã«ãšãŒãžã§ã³ãããããã€ããæ°åã®ã€ã³ã¹ã¿ã³ã¹ãããŒããŸã§èªåã¹ã±ãŒãªã³ã°ãå®çŸããªããããµãŒããŒã¬ã¹ã¢ãŒããã¯ãã£ã®å©ç¹ãç¶æãããŸãŸå®å šãªå¶åŸ¡ãä¿æããŸãã èªåã»ãã·ã§ã³ç®¡ç (Automatic Session Management)ïŒãšãŒãžã§ã³ãã¯ãããã»ã¹ã®ã¯ã©ãã·ã¥ãåèµ·åãã€ã³ã¹ã¿ã³ã¹éã®åæ£å®è¡ã«èãããå®å šãªäŒè©±ã³ã³ããã¹ããä¿æããæ°žç¶çãªã»ãã·ã§ã³ãç¶æããŸãã æ±ºå®çãªãã«ããšãŒãžã§ã³ããªãŒã±ã¹ãã¬ãŒã·ã§ã³ (Deterministic Multi-Agent Orchestrations)ïŒ ã³ãŒãã§å¶åŸ¡ããããäºæž¬å¯èœãã€åçŸæ§ã®ããå®è¡ãã¿ãŒã³ã§ãç¹åãã (specialized) durable agents ãçµã¿åãããŠåäœãããã (蚳蚻ïŒïŒã決å®ç㪠(deterministic)ããšã¯ãåãå ¥åã«å¯ŸããŠã¯åžžã«åãçµæãè¿ããã®ã§ããã®åäœãäºæž¬å¯èœãªãã®ãæããŸã) (蚳蚻ïŒïŒãdurable agentããšã¯ããã®ãã¬ãŒã ã¯ãŒã¯ã®ãšãŒãžã§ã³ããããåŒãã§ãããæ®éã®ãšãŒãžã§ã³ããšéã£ãŠDurable ãªæ§è³ªãæã£ãŠãããšãŒãžã§ã³ãã®ããšãæããŸã) ãµãŒããŒã¬ã¹ã«ããã³ã¹ãåæžã䌎ã Human-in-the-Loop (Human-in-the-Loop with Serverless Cost Savings)ïŒ äººéã®å ¥åãåŸ ã€éãã³ã³ãã¥ãŒããªãœãŒã¹ãæ¶è²»ãããã³ã¹ããçºçããŸããã Durable Task Scheduler ã«ããçµã¿èŸŒã¿ã®å¯èŠ³æž¬æ§ (Built-in Observability with Durable Task Scheduler)ïŒDurable Task Scheduler ã® UI ããã·ã¥ããŒããéããŠããšãŒãžã§ã³ãã®æäœããªãŒã±ã¹ãã¬ãŒã·ã§ã³ãæ·±ãå¯èŠåã§ããŸãã Durable Agent ãäœæããŠå®è¡ããŠã¿ã å ¬åŒããã¥ã¡ã³ã https://aka.ms/create-and-run-durable-agent ã³ãŒããµã³ãã« (Python/C#) # Python endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # æšæºç㪠Microsoft Agent Framework ãã¿ãŒã³ã«åŸã£ãŠ AI ãšãŒãžã§ã³ããäœæããŸã agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""ããªãã¯ãã©ããªããŒãã«å¯ŸããŠãèªã¿ãããæ§é åãããã é åçãªããã¥ã¡ã³ããäœæãããããã§ãã·ã§ãã«ãªã³ã³ãã³ãã©ã€ã¿ãŒã§ãã ããŒããäžããããããæ¬¡ã®æé ã§é²ããŠãã ããã 1. Web æ€çŽ¢ããŒã«ã䜿ã£ãŠããŒãããªãµãŒããã 2. ããã¥ã¡ã³ãã®ã¢ãŠãã©ã€ã³ãçæãã 3. é©åãªæžåŒã§èª¬åŸåã®ããããã¥ã¡ã³ããæžã 4. é¢é£ããäŸãšåºå žïŒåŒçšïŒãå«ãã""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Durable ãªã»ãã·ã§ã³ç®¡çã§ãšãŒãžã§ã³ãããã¹ãããããã« Function ã¢ããªãæ§æããŸã app = AgentFunctionApp(agents=[agent]) app.run() // C# var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT"); var deploymentName = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT") ?? "gpt-4o-mini"; // æšæºç㪠Microsoft Agent Framework ãã¿ãŒã³ã«åŸã£ãŠ AI ãšãŒãžã§ã³ããäœæããŸã AIAgent agent = new AzureOpenAIClient(new Uri(endpoint), new DefaultAzureCredential()) .GetChatClient(deploymentName) .CreateAIAgent( instructions: """ ããªãã¯ãã©ããªããŒãã«å¯ŸããŠãèªã¿ãããæ§é åãããã é åçãªããã¥ã¡ã³ããäœæãããããã§ãã·ã§ãã«ãªã³ã³ãã³ãã©ã€ã¿ãŒã§ãã ããŒããäžããããããæ¬¡ã®æé ã§é²ããŠãã ããã 1.Web æ€çŽ¢ããŒã«ã䜿ã£ãŠããŒãããªãµãŒããã 2.ããã¥ã¡ã³ãã®ã¢ãŠãã©ã€ã³ãçæãã 3.é©åãªæžåŒã§èª¬åŸåã®ããããã¥ã¡ã³ããæžã 4.é¢é£ããäŸãšåºå žïŒåŒçšïŒãå«ãã """, name: "DocumentPublisher", tools: [ AIFunctionFactory.Create(SearchWeb), AIFunctionFactory.Create(GenerateOutline) ]); // Durable ãªã¹ã¬ãã管çã§ãšãŒãžã§ã³ãããã¹ãããããã« Functions ã¢ããªãæ§æããŸã // ããã«ãããHTTP ãšã³ããã€ã³ããèªåã§äœæãããç¶æ ã®æ°žç¶åã管çãããŸã using IHost app = FunctionsApplication .CreateBuilder(args) .ConfigureFunctionsWebApplication() .ConfigureDurableAgents(options => options.AddAIAgent(agent) ) .Build(); app.Run(); ãªã Durable Task Extension ãå¿ èŠãªã®ã AI ãšãŒãžã§ã³ãããåçŽãªãã£ããããããããè€éã§é·æéå®è¡ãããã¿ã¹ã¯ãåŠçããé«åºŠãªã·ã¹ãã ãžãšé²åããã«ã€ããŠãæ°ããªèª²é¡ãæµ®äžããŸãã äŒè©±ãæ°æ¥ããæ°é±éã«ããããããããã»ã¹ã®åèµ·åãã¯ã©ãã·ã¥ãé害ãè¶ ããŠç¶æ ãä¿æããå¿ èŠããããŸãã ããŒã«åŒã³åºããéåžžã®ã¿ã€ã ã¢ãŠããè¶ ããæéãèŠããå Žåããããèªåãã§ãã¯ãã€ã³ããšåŸ©æ§ãå¿ èŠã§ãã 倧éã®ã¯ãŒã¯ããŒãã«å¯Ÿå¿ãããããæ°åã®ãšãŒãžã§ã³ãäŒè©±ãåæã«åŠçã§ããããã忣ã€ã³ã¹ã¿ã³ã¹éã§ã®åŒŸåçãªã¹ã±ãŒãªã³ã°ãæ±ããããŸãã è€æ°ã®å°éãšãŒãžã§ã³ãããä¿¡é Œæ§ã®é«ãããžãã¹ããã»ã¹ã®ããã«ãäºæž¬å¯èœã§åçŸå¯èœãªå®è¡ãã¿ãŒã³ã§èª¿æŽããå¿ èŠããããŸãã ãšãŒãžã§ã³ãã¯ãåŠçãé²ããåã«äººéã®æ¿èªãåŸ ã€å¿ èŠãããå Žåãããããã®éã¯çæ³çã«ã¯ãªãœãŒã¹ãæ¶è²»ããªã (課éãããªã) ããšãæãŸããŸãã Durable Extension ã¯ãAzure Durable Functions ã®æ©èœã Microsoft Agent Framework ã«æ¡åŒµããããšã§ããããã®èª²é¡ã«å¯Ÿå¿ããŸããããã«ãããé害ã«èãã匟åçã«ã¹ã±ãŒã«ããèä¹ æ§ãšåæ£å®è¡ã«ãã£ãŠäºæž¬å¯èœã«åäœãã AI ãšãŒãžã§ã³ããæ§ç¯ã§ããŸãã 4 ã€ã®æ± : 4D ãã®æ¡åŒµæ©èœã¯ã4 ã€ã®åºæ¬çãªäŸ¡å€ã®æ±ãéç§°ã4Dãã«åºã¥ããŠæ§ç¯ãããŠããŸãã Durability (èä¹ æ§) ãã¹ãŠã®ãšãŒãžã§ã³ãã®ç¶æ 倿ŽïŒã¡ãã»ãŒãžãããŒã«åŒã³åºããæææ±ºå®ïŒã¯ãèªåçã«èä¹ æ§ã®ãããã§ãã¯ãã€ã³ããšããŠä¿åãããŸãããšãŒãžã§ã³ãã¯ãã€ã³ãã©æŽæ°ãã¯ã©ãã·ã¥ãã埩æ§ããé·æéã®åŸ æ©äžã«ã¡ã¢ãªããã¢ã³ããŒããããŠãã³ã³ããã¹ãã倱ããã«åéã§ããŸããããã¯ãé·æéå®è¡ãããåŠçãå€éšã€ãã³ããåŸ æ©ãããšãŒãžã§ã³ãã«äžå¯æ¬ ã§ãã Distributed (忣åã®) ãšãŒãžã§ã³ãã®å®è¡ã¯ãã¹ãŠã®ã€ã³ã¹ã¿ã³ã¹ã§å©çšå¯èœã§ããã匟åçãªã¹ã±ãŒãªã³ã°ãšèªåãã§ã€ã«ãªãŒããŒãå®çŸããŸããæ£åžžãªããŒãã¯ãé害ãçºçããã€ã³ã¹ã¿ã³ã¹ã®äœæ¥ãã·ãŒã ã¬ã¹ã«åŒãç¶ããç¶ç¶çãªéçšãä¿èšŒããŸãããã®åæ£å®è¡ã¢ãã«ã«ãããæ°åã®ã¹ããŒããã«ãšãŒãžã§ã³ããã¹ã±ãŒã«ã¢ãããã䞊åã§åäœã§ããŸãã Deterministic (æ±ºå®æ§) ãšãŒãžã§ã³ãã®ãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã¯ãéåžžã®ã³ãŒããšããŠèšè¿°ãããåœä»€åããžãã¯ã䜿çšããŠäºæž¬å¯èœã«å®è¡ãããŸããå®è¡ãã¹ãå®çŸ©ããããšã§ãèªåãã¹ããæ€èšŒå¯èœãªã¬ãŒãã¬ãŒã«ãã¹ããŒã¯ãã«ããŒãä¿¡é Œã§ããããžãã¹ã¯ãªãã£ã«ã«ãªã¯ãŒã¯ãããŒãå®çŸããŸããå¿ èŠã«å¿ããŠæç€ºçãªå¶åŸ¡ãããŒãæäŸãããšãŒãžã§ã³ãäž»å°ã®ã¯ãŒã¯ãããŒãè£å®ããŸãã Debuggability (ãããã°ãããã) IDEããããã¬ãŒããã¬ãŒã¯ãã€ã³ããã¹ã¿ãã¯ãã¬ãŒã¹ãåäœãã¹ããªã©ã®éŠŽæã¿ã®ããéçºããŒã«ãããã°ã©ãã³ã°èšèªã䜿çšããŠéçºã»ãããã°ã§ããŸãããšãŒãžã§ã³ããšãã®ãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã¯ã³ãŒããšããŠè¡šçŸãããããããã¹ãããããã°ãä¿å®ã容æã§ãã å®éã®æ©èœã®åäœ ãµãŒããŒã¬ã¹ ãã¹ãã£ã³ã° (Serverless hosting) ãšãŒãžã§ã³ãã Azure Functions ïŒè¿æ¥äžã«ä»ã® Azure ãµãŒãã¹ã«ãæ¡åŒµäºå®ïŒã«ãããã€ãã䜿çšããŠããªããšãã¯ãŒããŸã§ãäœ¿çšæã¯æ°åã€ã³ã¹ã¿ã³ã¹ãŸã§èªåã¹ã±ãŒãªã³ã°ããŸããæ¶è²»ããã³ã³ãã¥ãŒãã£ã³ã° ãªãœãŒã¹ã«å¯ŸããŠã®ã¿æéãæ¯æããŸãããã®ã³ãŒããã¡ãŒã¹ãã®ãããã€ææ³ã«ããããµãŒããŒã¬ã¹ ã¢ãŒããã¯ãã£ã®å©ç¹ãç¶æããªãããã³ã³ãã¥ãŒãç°å¢ (compute environment) ãå®å šã«å¶åŸ¡ã§ããŸãã # Python endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # æšæºç㪠Microsoft Agent Framework ãã¿ãŒã³ã«åŸã£ãŠ AI ãšãŒãžã§ã³ããäœæããŸã agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""ããªãã¯ãã©ããªããŒãã«å¯ŸããŠãèªã¿ãããæ§é åãããã é åçãªããã¥ã¡ã³ããäœæãããããã§ãã·ã§ãã«ãªã³ã³ãã³ãã©ã€ã¿ãŒã§ãã ããŒããäžããããããæ¬¡ã®æé ã§é²ããŠãã ããã 1. Web æ€çŽ¢ããŒã«ã䜿ã£ãŠããŒãããªãµãŒããã 2. ããã¥ã¡ã³ãã®ã¢ãŠãã©ã€ã³ãçæãã 3. é©åãªæžåŒã§èª¬åŸåã®ããããã¥ã¡ã³ããæžã 4. é¢é£ããäŸãšåºå žïŒåŒçšïŒãå«ãã""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Durable ãªã»ãã·ã§ã³ç®¡çã§ãšãŒãžã§ã³ãããã¹ãããããã« Function ã¢ããªãæ§æããŸã app = AgentFunctionApp(agents=[agent]) app.run() Automatic session managementïŒèªåã»ãã·ã§ã³ç®¡çïŒ ãšãŒãžã§ã³ãã®ã»ãã·ã§ã³ã¯ãFunction ã¢ããªã§æ§æããèä¹ æ§ã®ããã¹ãã¬ãŒãžã«èªåçã«ãã§ãã¯ãã€ã³ããããè€æ°ã€ã³ã¹ã¿ã³ã¹éã§ã®èä¹ æ§ãšåæ£å®è¡ãå¯èœã«ããŸããäžæãããã»ã¹é害ã®åŸã§ããã©ã®ã€ã³ã¹ã¿ã³ã¹ããã§ããšãŒãžã§ã³ãã®å®è¡ãåéã§ããç¶ç¶çãªéçšãä¿èšŒãããŸãã å éšçã«ã¯ããšãŒãžã§ã³ã㯠Durable Entities ãšããŠå®è£ ãããŠããŸãããããã¯ãå®è¡éã§ç¶æ ãä¿æããã¹ããŒããã«ãªãªããžã§ã¯ãã§ãããã®ã¢ãŒããã¯ãã£ã«ãããåãšãŒãžã§ã³ãã»ãã·ã§ã³ã¯ãäŒè©±å±¥æŽãšã³ã³ããã¹ããä¿æããä¿¡é Œæ§ã®é«ãé·å¯¿åœã®ãšã³ãã£ãã£ãšããŠæ©èœããŸãã ã·ããªãªäŸ: è€æ°æ¥ããæ°é±éã«ãããè€éãªãµããŒãæ¡ä»¶ãåŠçããã«ã¹ã¿ããŒãµãŒãã¹ãšãŒãžã§ã³ãããšãŒãžã§ã³ããåãããã€ãããããå¥ã®ã€ã³ã¹ã¿ã³ã¹ã«ç§»åããå Žåã§ããäŒè©±å±¥æŽãã³ã³ããã¹ãã鲿ã¯ä¿æãããŸãã # æåã®å¯Ÿè©± - ããã¥ã¡ã³ãäœæçšã®æ°ããã¹ã¬ãããéå§ curl -X POST https://your-function-app.azurewebsites.net/api/agents/DocumentPublisher/threads \ -H "Content-Type: application/json" \ -d '{"message": "Azure Functions ã®å©ç¹ã«ã€ããŠã®ããã¥ã¡ã³ããäœæããŠãã ãã"}' # ã¬ã¹ãã³ã¹ã«ã¯ã¹ã¬ãã ID ãšåæã®ããã¥ã¡ã³ãã®ã¢ãŠãã©ã€ã³ïŒäžæžããå«ãŸããŸã # {"threadId": "doc789", "response": "Azure Functions ã®å©ç¹ã«é¢ããç¶²çŸ çãªããã¥ã¡ã³ããäœæããŸããææ°æ å ±ãæ€çŽ¢ããŸã⊠[ããã¥ã¡ã³ãäžæžã] # Azure Functions ã®å©ç¹\n\n## ã¯ããã«\nAzure Functions ã¯ãã€ã³ãã©ç®¡çãªãã§ã€ãã³ãé§åã®ã³ãŒããå®è¡ã§ãããµãŒããŒã¬ã¹ã®ã³ã³ãã¥ãŒããµãŒãã¹ã§ãâŠ\n\n## ã³ã¹ãå¹ç\n- å®è¡æéã«å¯ŸããŠã®ã¿æ¯æã\n- ã¢ã€ãã«ç¶æ ã®ãªãœãŒã¹ã«ã¯æéãããããªã\n- èªåã¹ã±ãŒãªã³ã°ã«ããéå°ããããžã§ãã³ã°ãåæžâŠ\n\n## éçºè ã®çç£æ§\n- è€æ°èšèªã®ãµããŒãïŒC#, Python, JavaScript, JavaïŒ\n- çµ±åéçºããŒã«ãš CI/CD âŠ\n\n## ã¹ã±ãŒã©ããªãã£\n- éèŠã«åºã¥ãèªåã¹ã±ãŒãªã³ã°\n- äœçŸäžãã®ãªã¯ãšã¹ããã·ãŒã ã¬ã¹ã«åŠçâŠ\n\nåèæç®: [Azure ããã¥ã¡ã³ãããµãŒããŒã¬ã¹èšç®ã«é¢ããç ç©¶]"} # 2 åç®ã®å¯Ÿè©± - åãã¹ã¬ããã§ããã¥ã¡ã³ããæ¹å curl -X POST https://your-function-app.azurewebsites.net/api/agents/DocumentPublisher/threads/doc789 \ -H "Content-Type: application/json" \ -d '{"message": "ä»ã® Azure ãµãŒãã¹ãšã®çµ±åã«é¢ããã»ã¯ã·ã§ã³ã远å ããŠããããŸããïŒ"}' # ãšãŒãžã§ã³ã㯠Azure Functions ããã¥ã¡ã³ãã®ã³ã³ããã¹ããä¿æããèŠæ±ãããã»ã¯ã·ã§ã³ã远å ããŸã # {"threadId": "doc789", "response": "Azure Functions ããã¥ã¡ã³ãã«ãå æ¬çãªçµ±åã»ã¯ã·ã§ã³ã远å ããŸãã:\n\n## Azure ãµãŒãã¹ãšã®çµ±å\n\n### Azure Storage\nBlob StorageãQueue StorageãTable Storage ãžã®ããªã¬ãŒãšãã€ã³ãã£ã³ã°ã«ãããã€ãã³ãé§åã¢ãŒããã¯ãã£ãã·ãŒã ã¬ã¹ã«å®çŸâŠ\n\n### Azure Event Grid ãš Event Hubs\nãªã¢ã«ã¿ã€ã ã®ã€ãã³ãã¹ããªãŒã ãåŠçããã¹ã±ãŒã«å¯èœãª Pub/Sub ãã¿ãŒã³ãå®è£ âŠ\n\n### Azure Cosmos DB\nããã¥ã¡ã³ãããŒã¿ããŒã¹æäœåãã®çµã¿èŸŒã¿ãã€ã³ãã£ã³ã°ãšã倿Žãã£ãŒãã®èªååŠçâŠ\n\n### Azure Service Bus\nãšã³ã¿ãŒãã©ã€ãºã¡ãã»ãŒãžã³ã°æ©èœã«ããä¿¡é Œæ§ã®é«ãã¡ãã»ãŒãžåŠçâŠ\n\n### Azure AI Services\nOpenAIãCognitive ServicesãAI Search ã容æã«çµ±åããŠã€ã³ããªãžã§ã³ããªã¢ããªã±ãŒã·ã§ã³ãå®çŸâŠ\n\nãã®ã»ã¯ã·ã§ã³ã¯ã¹ã±ãŒã©ããªãã£ã®ã»ã¯ã·ã§ã³ã®åŸã«è¿œå ãããŠããŸãããŠãŒã¹ã±ãŒã¹ããããã€ã®ãã¹ããã©ã¯ãã£ã¹ã远å ããŸããããïŒ"} (èš³è æ³šïŒ11/20 çŸåšãäžèšã®ãšã³ããã€ã³ã URL ããªã¯ãšã¹ããã¬ã¹ãã³ã¹ã®åœ¢åŒã¯å€æŽãããŠããŸãããã®èšäºã§ã¯ãªãªãžãã«èšäºã®æã®ãŸãŸã®èšèŒã«ããŠããŸãããä»åŸã (çŸåšãŸã preview çã§) å€ããå¯èœæ§ããããããææ°ã®æ å ±ã¯å ¬åŒããã¥ã¡ã³ããåç §ããŠãã ããïŒhttps://aka.ms/create-and-run-durable-agent ) Deterministic multi-agent orchestrationsïŒæ±ºå®çãªãã«ããšãŒãžã§ã³ããªãŒã±ã¹ãã¬ãŒã·ã§ã³ïŒ åœä»€åã³ãŒãã䜿çšããŠãè€æ°ã®å°éç㪠durable agents ã調æŽããŸãããã®å Žåãå¶åŸ¡ãããŒã¯éçºè ãå®çŸ©ããŸããããã¯ããšãŒãžã§ã³ããæ¬¡ã®ã¹ããããæ±ºå®ãããšãŒãžã§ã³ãäž»å°ã®ã¯ãŒã¯ãããŒãšã¯ç°ãªããŸãã æ±ºå®çãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã¯ãèªåãã§ãã¯ãã€ã³ããšåŸ©æ§ãåããäºæž¬å¯èœã§åçŸå¯èœãªå®è¡ãã¿ãŒã³ãæäŸããŸãã ã·ããªãªäŸ: ã¡ãŒã«åŠçã·ã¹ãã ã§ããŸãã¹ãã æ€åºãšãŒãžã§ã³ãã䜿çšãããã®åé¡ã«åºã¥ããŠæ¡ä»¶ä»ãã§ç°ãªãå°éãšãŒãžã§ã³ãã«ã«ãŒãã£ã³ã°ããŸãããªãŒã±ã¹ãã¬ãŒã·ã§ã³ã¯ãã©ã®ã¹ãããã§é害ãçºçããŠãèªåçã«åŸ©æ§ããå®äºæžã¿ã®ãšãŒãžã§ã³ãåŒã³åºãã¯åå®è¡ãããŸããã # Python app.orchestration_trigger(context_name="context") def document_publishing_orchestration(context: DurableOrchestrationContext): """è€æ°ã®å°éãšãŒãžã§ã³ããå調ãããæ±ºå®çãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã""" doc_request = context.get_input() # ãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã®ã³ã³ããã¹ãããå°éãšãŒãžã§ã³ããååŸ research_agent = context.get_agent("ResearchAgent") writer_agent = context.get_agent("DocumentPublisherAgent") # ã¹ããã 1ïŒWeb æ€çŽ¢ã§ãããã¯ã調æ»ãã research_result = yield research_agent.run( messages=f"次ã®ãããã¯ã調æ»ããäž»èŠãªæ å ±ãåéããŠãã ããïŒ{doc_request.topic}", response_schema=ResearchResult ) # ã¹ããã 2ïŒèª¿æ»çµæã«åºã¥ããŠã¢ãŠãã©ã€ã³ãçæãã outline = yield context.call_activity("generate_outline", { "topic": doc_request.topic, "research_data": research_result.findings }) # ã¹ããã 3ïŒèª¿æ»çµæãšã¢ãŠãã©ã€ã³ã«åºã¥ããŠããã¥ã¡ã³ããäœæãã document = yield writer_agent.run( messages=f"""以äžã®ãããã¯ã«ã€ããŠãç¶²çŸ çãªããã¥ã¡ã³ããäœæããŠãã ããïŒ{doc_request.topic} 調æ»çµæ: {research_result.findings} ã¢ãŠãã©ã€ã³: {outline} é©åãªæžåŒã§ãæ§é åããèªã¿ããããé åçãªããã¥ã¡ã³ãã«ããŠãã ãããå¿ èŠã«å¿ããŠåºå žïŒåŒçšïŒãå«ããŠãã ããã""", response_schema=DocumentResponse ) # ã¹ããã 4ïŒçæããããã¥ã¡ã³ããä¿åããŠå ¬éãã return yield context.call_activity("publish_document", { "title": doc_request.topic, "content": document.text, "citations": document.citations }) Human-in-the-loopïŒäººéãä»åšãããä»çµã¿ïŒ ãªãŒã±ã¹ãã¬ãŒã·ã§ã³ããšãŒãžã§ã³ãã¯ã人éã®å ¥åãæ¿èªãã¬ãã¥ãŒãåŸ ã€éãã³ã³ãã¥ãŒããªãœãŒã¹ãæ¶è²»ããã«äžæåæ¢ã§ããŸããã¢ããªã±ãŒã·ã§ã³ãã¯ã©ãã·ã¥ãåèµ·åãããšããŠããèä¹ æ§ã®ããå®è¡ (durable execution) ã«ãããæ°æ¥ããæ°é±éã«ãããã人éã®å¿çããªãŒã±ã¹ãã¬ãŒã·ã§ã³ãåŸ æ©ããããšãå¯èœã§ãããµãŒããŒã¬ã¹ãã¹ãã£ã³ã°ãšçµã¿åãããããšã§ãåŸ æ©æéäžã¯ãã¹ãŠã®ã³ã³ãã¥ãŒããªãœãŒã¹ã忢ãã人éãå ¥åãæäŸãããŸã§ã³ã³ãã¥ãŒãã³ã¹ããå®å šã«æé€ãããŸãã ã·ããªãªäŸ: ã³ã³ãã³ãå ¬éãšãŒãžã§ã³ããäžæžããçæãã人éã®ã¬ãã¥ãŒæ åœè ã«éä¿¡ããŠãæ¿èªãæ°æ¥éåŸ æ©ããã±ãŒã¹ããã®éãã¬ãã¥ãŒæéäžã¯ã³ã³ãã¥ãŒããªãœãŒã¹ãå®è¡ïŒãŸãã¯èª²éïŒããŸããã人éã®å¿çãå±ããšããªãŒã±ã¹ãã¬ãŒã·ã§ã³ã¯äŒè©±ã³ã³ããã¹ããšå®è¡ç¶æ ãå®å šã«ä¿æãããŸãŸèªåçã«åéããŸãã # Python app.orchestration_trigger(context_name="context") def content_approval_workflow(context: DurableOrchestrationContext): """人éãä»åšãããã¯ãŒã¯ãããŒïŒåŸ æ©äžã¯ã³ã¹ããŒãïŒ""" topic = context.get_input() # ã¹ããã 1ïŒãšãŒãžã§ã³ãã䜿ã£ãŠã³ã³ãã³ããçæ content_agent = context.get_agent("ContentGenerationAgent") draft_content = yield content_agent.run(f"{topic} ã«ã€ããŠã®èšäºãæžããŠãã ãã") # ã¹ããã 2ïŒäººéã«ããã¬ãã¥ãŒãäŸé Œ yield context.call_activity("notify_reviewer", draft_content) # ã¹ããã 3ïŒæ¿èªãåŸ æ©ïŒåŸ æ©äžã¯ã³ã³ãã¥ãŒããªãœãŒã¹ãæ¶è²»ããªãïŒ approval_event = context.wait_for_external_event("ApprovalDecision") timeout_task = context.create_timer(context.current_utc_datetime + timedelta(hours=24)) winner = yield context.task_any([approval_event, timeout_task]) if winner == approval_event: timeout_task.cancel() approved = approval_event.result if approved: result = yield context.call_activity("publish_content", draft_content) return result else: return "ã³ã³ãã³ãã¯åŽäžãããŸãã" else: # ã¿ã€ã ã¢ãŠãæïŒã¬ãã¥ãŒããšã¹ã«ã¬ãŒã·ã§ã³ result = yield context.call_activity("escalate_for_review", draft_content) return result Built-in agent observabilityïŒãšãŒãžã§ã³ãã®çµã¿èŸŒã¿å¯èŠ³æž¬æ§ïŒ Function App ã Durable Task Scheduler ãèä¹ ããã¯ãšã³ããšããŠæ§æããŸãïŒãšãŒãžã§ã³ããšãªãŒã±ã¹ãã¬ãŒã·ã§ã³ã®ç¶æ ãæ°žç¶åããä»çµã¿ïŒãDurable Task Scheduler ã¯ãdurable agents ã«æšå¥šãããããã¯ãšã³ãã§ãããæé«ã®ã¹ã«ãŒãããæ§èœãå®å šã«ç®¡çãããã€ã³ãã©ããã㊠UI ããã·ã¥ããŒãã«ããçµã¿èŸŒã¿ã®å¯èŠ³æž¬æ§ãæäŸããŸãã Durable Task Scheduler ããã·ã¥ããŒãã¯ããšãŒãžã§ã³ãã®æäœãæ·±ãå¯èŠåããŸãïŒ äŒè©±å±¥æŽ (Conversation history): åãšãŒãžã§ã³ãã»ãã·ã§ã³ã®å®å šãªäŒè©±ã¹ã¬ããã衚瀺ãããã¹ãŠã®ã¡ãã»ãŒãžãããŒã«åŒã³åºããä»»ææç¹ã®ã³ã³ããã¹ãã確èªå¯èœ ãã«ããšãŒãžã§ã³ãã®å¯èŠå (Multi-agent visualization): è€æ°ã®å°éãšãŒãžã§ã³ããåŒã³åºãéã®å®è¡ãããŒãããšãŒãžã§ã³ãéã®ãã³ããªãã䞊åå®è¡ãæ¡ä»¶åå²ãå«ãèŠèŠçãªè¡šçŸã§è¡šç€º ããã©ãŒãã³ã¹ææš (Performance metrics): ãšãŒãžã§ã³ãã®å¿çæéãããŒã¯ã³äœ¿çšéããªãŒã±ã¹ãã¬ãŒã·ã§ã³ã®å®è¡æéãç£èŠ å®è¡å±¥æŽ (Execution history): ãããã°çšã«å®å šãªãªãã¬ã€æ©èœãåãã詳现ãªå®è¡ãã°ã«ã¢ã¯ã»ã¹å¯èœ Demo Video Language support The Durable Task Extension ã¯ä»¥äžã®èšèªããµããŒãããŠããŸã: C# (.NET 8.0+) with Azure Functions Python (3.10+) with Azure Functions Support for additional computes coming soon. 仿¥ããå§ããŠã¿ãŸããã Click here to create and run a durable agent Learn more Overview documentation C# Samples Python Samples åæ Bulletproof agents with the durable task extension for Microsoft Agent Framework | Microsoft Community HubPython + IA: Resumen y Recursos
Acabamos de concluir nuestra serie sobre Python + IA, un recorrido completo de nueve sesiones donde exploramos a fondo cómo usar modelos de inteligencia artificial generativa desde Python. Durante la serie presentamos varios tipos de modelos, incluyendo LLMs, modelos de embeddings y modelos de visión. Profundizamos en técnicas populares como RAG, tool calling y salidas estructuradas. Evaluamos la calidad y seguridad de la IA mediante evaluaciones automatizadas y red-teaming. Finalmente, desarrollamos agentes de IA con frameworks populares de Python y exploramos el nuevo Model Context Protocol (MCP). Para que puedas aplicar lo aprendido, todos nuestros ejemplos funcionan con GitHub Models, un servicio que ofrece modelos gratuitos a todos los usuarios de GitHub para experimentación y aprendizaje. Aunque no hayas asistido a las sesiones en vivo, ¡todavÃa puedes acceder a todos los materiales usando los enlaces de abajo! Si eres instructor, puedes usar las diapositivas y el código en tus propias clases. Python + IA: Modelos de Lenguaje Grandes (LLMs) ðº Ver grabación En esta sesión exploramos los LLMs, los modelos que impulsan ChatGPT y GitHub Copilot. Usamos Python con paquetes como OpenAI SDK y LangChain, experimentamos con prompt engineering y ejemplos few-shot, y construimos una aplicación completa basada en LLMs. También explicamos la importancia de la concurrencia y el streaming en apps de IA. Diapositivas: aka.ms/pythonia/diapositivas/llms Código: python-openai-demos GuÃa de repositorio: video Python + IA: Embeddings Vectoriales ðº Ver grabación En nuestra segunda sesión, aprendemos sobre los modelos de embeddings vectoriales, que convierten texto o imágenes en arreglos numéricos. Comparamos métricas de distancia, aplicamos cuantización y experimentamos con modelos multimodales. Diapositivas: aka.ms/pythonia/diapositivas/embeddings Código: vector-embedding-demos GuÃa de repositorio: video Python + IA: Retrieval Augmented Generation (RAG) ðº Ver grabación Descubrimos cómo usar RAG para mejorar las respuestas de los LLMs añadiendo contexto relevante. Construimos flujos RAG en Python con distintas fuentes (CSVs, sitios web, documentos y bases de datos) y terminamos con una aplicación completa basada en Azure AI Search. Diapositivas: aka.ms/pythonia/diapositivas/rag Código: python-openai-demos GuÃa de repositorio: video Python + IA: Modelos de Visión ðº Ver grabación Los modelos de visión aceptan texto e imágenes, como GPT-4o y GPT-4o mini. Creamos una app de chat con imágenes, realizamos extracción de datos y construimos un motor de búsqueda multimodal. Diapositivas: aka.ms/pythonia/diapositivas/vision Código: vector-embeddings GuÃa de repositorio: video Python + IA: Salidas Estructuradas ðº Ver grabación Aprendemos a generar respuestas estructuradas con LLMs usando Pydantic BaseModel. Este enfoque permite validación automática de los resultados, útil para extracción de entidades, clasificación y flujos de agentes. Diapositivas: aka.ms/pythonia/diapositivas/salidas Código: python-openai-demos y entity-extraction-demos GuÃa de repositorio: video Python + IA: Calidad y Seguridad ðº Ver grabación Analizamos cómo usar la IA de forma segura y cómo evaluar la calidad de las respuestas. Mostramos cómo configurar Azure AI Content Safety y usar el Azure AI Evaluation SDK para medir resultados de los modelos. Diapositivas: aka.ms/pythonia/diapositivas/calidad Código: ai-quality-safety-demos GuÃa de repositorio: video Python + IA: Tool Calling ðº Ver grabación Exploramos el tool calling, base para crear agentes de IA. Definimos herramientas con esquemas JSON y funciones Python, manejamos llamadas paralelas y flujos iterativos. Diapositivas: aka.ms/pythonia/diapositivas/herramientas Código: python-openai-demos GuÃa de repositorio: video Python + IA: Agentes de IA ðº Ver grabación Creamos agentes de IA con frameworks como el agent-framework de Microsoft y LangGraph, mostrando arquitecturas con múltiples herramientas, supervisores y flujos con intervención humana. Diapositivas: aka.ms/pythonia/diapositivas/agentes Código: python-ai-agents-demos GuÃa de repositorio: video Python + IA: Model Context Protocol (MCP) ðº Ver grabación Cerramos la serie con MCP (Model Context Protocol), la tecnologÃa más innovadora de 2025. Mostramos cómo usar el SDK de FastMCP en Python para crear un servidor MCP local, conectarlo a GitHub Copilot, construir un cliente MCP y conectar frameworks como LangGraph y agent-framework. También discutimos los riesgos de seguridad asociados. Diapositivas: aka.ms/pythonia/diapositivas/mcp Código: python-ai-mcp-demos GuÃa de repositorio: video Además Si tienen preguntas, por favor, en el canal #Espanol en nuestro Discord: https://aka.ms/pythonia/discord Todos los jueves tengo office hours: https://aka.ms/pythonia/horas Encuentra más tutoriales 100% en español sobre Python + AI en https://youtube.com/@lagpsIntroducing langchain-azure-storage: Azure Storage integrations for LangChain
We're excited to introduce langchain-azure-storage , the first official Azure Storage integration package built by Microsoft for LangChain 1.0. As part of its launch, we've built a new Azure Blob Storage document loader (currently in public preview) that improves upon prior LangChain community implementations. This new loader unifies both blob and container level access, simplifying loader integration. More importantly, it offers enhanced security through default OAuth 2.0 authentication, supports reliably loading millions to billions of documents through efficient memory utilization, and allows pluggable parsing, so you can leverage other document loaders to parse specific file formats. What are LangChain document loaders? A typical RetrievalâAugmented Generation (RAG) pipeline follows these main steps: Collect source content (PDFs, DOCX, Markdown, CSVs) â often stored in Azure Blob Storage. Parse into text and associated metadata (i.e., represented as LangChain Document objects). Chunk + embed those documents and store in a vector store (e.g., Azure AI Search, Postgres pgvector, etc.). At query time, retrieve the most relevant chunks and feed them to an LLM as grounded context. LangChain document loaders make steps 1â2 turnkey and consistent so the rest of the stack (splitters, vector stores, retrievers) âjust worksâ. See this LangChain RAG tutorial for a full example of these steps when building a RAG application in LangChain. How can the Azure Blob Storage document loader help? The langchain-azure-storage package offers the AzureBlobStorageLoader , a document loader that simplifies retrieving documents stored in Azure Blob Storage for use in a LangChain RAG application. Key benefits of the AzureBlobStorageLoader include: Flexible loading of Azure Storage blobs to LangChain Document objects. You can load blobs as documents from an entire container, a specific prefix within a container, or by blob names. Each document loaded corresponds 1:1 to a blob in the container. Lazy loading support for improved memory efficiency when dealing with large document sets. Documents can now be loaded one-at-a-time as you iterate over them instead of all at once. Automatically uses DefaultAzureCredential to enable seamless OAuth 2.0 authentication across various environments, from local development to Azure-hosted services. You can also explicitly pass your own credential (e.g., ManagedIdentityCredential , SAS token). Pluggable parsing. Easily customize how documents are parsed by providing your own LangChain document loader to parse downloaded blob content. Using the Azure Blob Storage document loader Installation To install the langchain-azure-storage package, run: pip install langchain-azure-storage Loading documents from a container To load all blobs from an Azure Blob Storage container as LangChain Document objects, instantiate the AzureBlobStorageLoader with the Azure Storage account URL and container name: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>" ) # lazy_load() yields one Document per blob for all blobs in the container for doc in loader.lazy_load(): print(doc.metadata["source"]) # The "source" metadata contains the full URL of the blob print(doc.page_content) # The page_content contains the blob's content decoded as UTF-8 text Loading documents by blob names To only load specific blobs as LangChain Document objects, you can additionally provide a list of blob names: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>", ["<blob-name-1>", "<blob-name-2>"] ) # lazy_load() yields one Document per blob for only the specified blobs for doc in loader.lazy_load(): print(doc.metadata["source"]) # The "source" metadata contains the full URL of the blob print(doc.page_content) # The page_content contains the blob's content decoded as UTF-8 text Pluggable parsing By default, loaded Document objects contain the blob's UTF-8 decoded content. To parse non-UTF-8 content (e.g., PDFs, DOCX, etc.) or chunk blob content into smaller documents, provide a LangChain document loader via the loader_factory parameter. When loader_factory is provided, the AzureBlobStorageLoader processes each blob with the following steps: Downloads the blob to a new temporary file Passes the temporary file path to the loader_factory callable to instantiate a document loader Uses that loader to parse the file and yield Document objects Cleans up the temporary file For example, below shows parsing PDF documents with the PyPDFLoader from the langchain-community package: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader from langchain_community.document_loaders import PyPDFLoader # Requires langchain-community and pypdf packages loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>", prefix="pdfs/", # Only load blobs that start with "pdfs/" loader_factory=PyPDFLoader # PyPDFLoader will parse each blob as a PDF ) # Each blob is downloaded to a temporary file and parsed by PyPDFLoader instance for doc in loader.lazy_load(): print(doc.page_content) # Content parsed by PyPDFLoader (yields one Document per page in the PDF) This file path-based interface allows you to use any LangChain document loader that accepts a local file path as input, giving you access to a wide range of parsers for different file formats. Migrating from community document loaders to langchain-azure-storage If you're currently using AzureBlobStorageContainerLoader or AzureBlobStorageFileLoader from the langchain-community package, the new AzureBlobStorageLoader provides an improved alternative. This section provides step-by-step guidance for migrating to the new loader. Steps to migrate To migrate to the new Azure Storage document loader, make the following changes: Depend on the langchain-azure-storage package Update import statements from langchain_community.document_loaders to langchain_azure_storage.document_loaders . Change class names from AzureBlobStorageFileLoader and AzureBlobStorageContainerLoader to AzureBlobStorageLoader . Update document loader constructor calls to: Use an account URL instead of a connection string. Specify UnstructuredLoader as the loader_factory to continue to use Unstructured for parsing documents. Enable Microsoft Entra ID authentication in environment (e.g., run az login or configure managed identity) instead of using connection string authentication. Migration samples Below shows code snippets of what usage patterns look like before and after migrating from langchain-community to langchain-azure-storage : Before migration from langchain_community.document_loaders import AzureBlobStorageContainerLoader, AzureBlobStorageFileLoader container_loader = AzureBlobStorageContainerLoader( "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<account-key>;EndpointSuffix=core.windows.net", "<container>", ) file_loader = AzureBlobStorageFileLoader( "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<account-key>;EndpointSuffix=core.windows.net", "<container>", "<blob>" ) After migration from langchain_azure_storage.document_loaders import AzureBlobStorageLoader from langchain_unstructured import UnstructuredLoader # Requires langchain-unstructured and unstructured packages container_loader = AzureBlobStorageLoader( "https://<account>.blob.core.windows.net", "<container>", loader_factory=UnstructuredLoader # Only needed if continuing to use Unstructured for parsing ) file_loader = AzureBlobStorageLoader( "https://<account>.blob.core.windows.net", "<container>", "<blob>", loader_factory=UnstructuredLoader # Only needed if continuing to use Unstructured for parsing ) What's next? We're excited for you to try the new Azure Blob Storage document loader and would love to hear your feedback! Here are some ways you can help shape the future of langchain-azure-storage : Show support for interface stabilization - The document loader is currently in public preview and the interface may change in future versions based on feedback. If you'd like to see the current interface marked as stable, upvote the proposal PR to show your support. Report issues or suggest improvements - Found a bug or have an idea to make the document loaders better? File an issue on our GitHub repository. Propose new LangChain integrations - Interested in other ways to use Azure Storage with LangChain (e.g., checkpointing for agents, persistent memory stores, retriever implementations)? Create a feature request or write to us to let us know. Your input is invaluable in making langchain-azure-storage better for the entire community! Resources langchain-azure GitHub repository langchain-azure-storage PyPI package AzureBlobStorageLoader usage guide AzureBlobStorageLoader documentation reference