azure ai foundry
213 TopicsGPT-5.5-Pro not listed in foundry?
The model is mentioned in this blog post : https://azure.microsoft.com/en-us/blog/openais-gpt-5-5-in-microsoft-foundry-frontier-intelligence-on-an-enterprise-ready-platform/ But it is currently not listed on Foundry. Only latest pro model is 5.4-pro. When will 5.5-pro model be available on azure foundry?51Views0likes0CommentsFixing Broken Markdown in AI Translation: Hardening a Production Pipeline
By Minseok Song and Hiroshi Yoshioka (Microsoft MVPs) TL;DR Recent community feedback, especially from Japanese translations, revealed that many translation failures were not semantic, but structural. Through detailed issue reports and discussions, we identified recurring patterns such as broken links, malformed code fences, inconsistent list structures, and CJK-specific formatting issues. In response, Co-op Translator has undergone a series of structural improvements across multiple releases, culminating in v0.18.1 with enhancements such as parser-based code fence handling, list-aware chunking, language-specific Markdown templates, safer CJK emphasis normalization, more robust image migration, and improved internal anchor consistency. These changes were directly informed by real-world community feedback. We would like to especially thank Hiroshi Yoshioka (Microsoft MVP), whose many detailed reports not only uncovered several of these systemic issues but also made this community report possible. The result is not just improved Japanese translations, but a more reliable and resilient translation pipeline for any repository that depends on Markdown fidelity. Introduction Most translation bugs are not actually translation bugs. They are structural failures. They show up as broken links, missing bold markers, unclosed code fences, skipped content, or images that quietly point to the wrong place. To a learner reading translated technical documentation, those issues can make a page feel untrustworthy. To a maintainer localizing documentation at scale, they reveal something deeper: the translation pipeline is not preserving structure as carefully as it preserves meaning. That insight became much clearer over the past several months through community feedback on Co-op Translator. Co-op Translator helps maintain educational GitHub content across many languages while keeping Markdown, images, and notebooks synchronized as the source evolves. As Hiroshi Yoshioka reported a series of Japanese translation issues across real Microsoft learning repositories, each issue looked narrow on the surface: a broken link here, a skipped line there, bold markers not surviving around linked text, HTML image tags not being rewritten, or code fences breaking after chunking. Example of a real community-reported issue where a code block was broken during translation, causing structural corruption in the output. But taken together, those reports exposed a broader pattern: The hardest problem was not “translate this sentence.” The hardest problem was “translate this document without damaging its structure.” This post is a community report on the hardening work that followed, especially in the recent run-up to v0.18.1, and what we learned from those real-world cases. Why these reports mattered One of the most useful things about community feedback is that it reveals failure modes that synthetic tests often miss. These were not edge cases found in toy Markdown samples. The reports came from real translated content in active educational repositories. That meant we were dealing with the kinds of files maintainers actually have to ship: nested lists fenced code blocks inline HTML relative links translated headings migrated image assets CJK punctuation and emphasis edge cases In other words, we were seeing the kinds of Markdown that break when a translation system is only mostly correct. 1) We stopped treating code fences like a regex problem Code fences are not a regex problem—they are a structural one. Left: Regex-based handling breaks code fences and list structure across chunks. Right: Parser-based processing preserves code blocks and their surrounding context as atomic units. One of the earliest recurring themes was code fence integrity. A report on incorrectly handled triple backticks highlighted a classic failure mode: if fenced blocks are detected or split incorrectly, placeholders can fall out of sync, chunk boundaries can be corrupted, and the translated file can come back structurally damaged. A later report showed a closely related issue: list items and indented code placeholders could be split into separate chunks, which then caused broken fences downstream. The right fix was not another regex patch. Instead, Co-op Translator moved to a parser-based approach using markdown-it-py for fenced code block detection. This made code block handling spec-aware and more resilient to cases like unmatched fences, variable fence lengths, and info strings. More importantly, it ensured code sections were treated as atomic units during chunking and placeholder restoration. This same principle was extended to list-aware chunking. Rather than splitting Markdown line by line and hoping the model would preserve structure, the pipeline now groups list items together with their continuation lines and indented placeholders such as @@CODE_BLOCK_X@@. This prevents bullets and their associated code content from being separated into different translation chunks. This was not just a better heuristic. It changed the unit of chunking itself. In practice, this required modifying the chunking pipeline to detect and preserve list-item blocks before token-based splitting. Instead of treating each line independently, we introduced a grouping step that keeps the entire list context intact, including nested indentation and code placeholders. The change was implemented directly in the chunking logic: lines = _group_lines_preserving_list_items(part_text) This helper ensures that list items and their associated code blocks are processed as a single unit, preventing structural corruption during translation. Why this mattered Technical documentation frequently embeds code examples directly under list items or step-by-step instructions. When these relationships are broken during translation, the issue is not just cosmetic. It results in structurally invalid Markdown and misplaced code blocks that can confuse readers and make examples unusable. These were not edge cases. They appeared in real production documentation where: Fenced code blocks became malformed after chunking List items and their associated code placeholders were separated into different segments Placeholder ordering drifted, breaking reconstruction of the original structure In practice, this meant that even when the translated text was correct, the document itself could no longer be trusted as a working technical resource. What changed in practice Before: Code samples could leak out of their list context List items and code blocks were split across chunks Placeholder ordering could drift, breaking reconstruction After: Code blocks are preserved as atomic units during chunking List-bound code samples remain intact Placeholder ordering is stable across the pipeline 2) We restored internal link consistency across translation chunks Even when each chunk appears locally correct, internal links can break at the document level. Left: Anchor links drift out of sync because headings and links are translated independently across chunks. Right: After document-level normalization, links correctly resolve to their corresponding translated headings. Another cluster of issues surfaced when translating longer Markdown documents: internal links would silently break once the content was processed in chunks. Co-op Translator splits large documents into multiple chunks to fit within model constraints. While this works well for translation itself, it introduces a structural problem. Internal links such as [Go to section](#section-name) depend on heading-derived anchor slugs, and those slugs can change during translation. When each chunk is translated independently, links and headings can drift out of sync. In practice, this meant that even when translated headings and links looked correct locally within a chunk, they no longer matched at the document level. Tables of contents, section jump links, and cross-references inside the same file could silently break. The right fix was not to rely on chunk-level correctness. Instead, Co-op Translator introduced a document-level normalization step for internal anchor links. The pipeline now parses both the source and translated Markdown using markdown-it, extracts headings, generates GitHub-style slugs from the translated headings, and then realigns internal anchor links so they correctly point to their corresponding translated sections. Rather than trusting fragment identifiers produced during chunk-level translation, links are reconciled against the final translated document structure. This was not just a small post-processing tweak. It changed where consistency is enforced. In practice, this required introducing a normalization step that runs after all chunks are merged back into a single document. Instead of assuming each chunk is self-consistent, the system now treats the entire document as the source of truth and rebinds internal links accordingly. The change was implemented as a dedicated normalization pass: normalize_internal_anchor_links(source_markdown, translated_markdown) This function aligns fragment identifiers with translated heading slugs, ensuring that internal navigation remains valid even when content has been translated in multiple independent chunks. Why this mattered Technical documentation relies heavily on internal navigation such as tables of contents, section links, and cross-references within the same file. When anchor links drift out of sync with translated headings, the document becomes difficult to navigate even if the translation itself is accurate. Readers may click on links that lead to incorrect sections or nowhere at all, which significantly reduces trust in the content. These issues surfaced in real-world usage where: Internal links no longer matched translated heading slugs Tables of contents pointed to incorrect or missing sections Cross-references silently broke across chunk boundaries This highlighted that correctness at the chunk level was not enough. Consistency had to be enforced at the document level. What changed in practice Before: Internal links could drift out of sync with translated headings Tables of contents pointed to incorrect or missing sections Cross-references silently broke across chunk boundaries Long documents behaved like fragmented outputs rather than a single unit After: Internal links are realigned with translated heading slugs at the document level Tables of contents correctly resolve to translated sections Cross-references remain consistent across the entire document Long Markdown documents behave as a single coherent unit 3) We fixed CJK emphasis the safe way Bold and italic rendering around CJK text was a recurring and subtle failure point. Issues like “Markdown bold not handled correctly” may look minor, but they reveal a deeper compatibility problem: many Markdown renderers do not consistently apply emphasis when markers sit directly next to CJK characters. To address this, we introduced a dedicated normalization step for emphasis markers. Instead of relying on each renderer to interpret `*`, `**`, and `***` correctly in CJK-adjacent cases, Co-op Translator converts them into equivalent HTML tags such as `<em>` and `<strong>` when the target language is Japanese, Korean, or Chinese. This shifts emphasis rendering from renderer-dependent behavior to deterministic output. What mattered was not just fixing it, but fixing it safely. The normalization is strictly scoped to CJK languages and carefully designed to avoid overmatching. It does not mutate inline code spans or unrelated fragments. This is critical, because overly aggressive formatting fixes can easily break code, identifiers, or underscore-heavy technical text. Unlike whitespace-delimited languages, Japanese, Korean, and Chinese often place characters directly adjacent to emphasis markers without clear boundaries. For example, a phrase like: example is ... may be translated into Japanese as: 例は ... Here, the particle は is attached directly to the emphasized word. In some Markdown renderers, this breaks the expected boundary around ..., causing the emphasis to render incorrectly or not at all. This pattern is not limited to Japanese. Similar boundary issues can appear across CJK languages due to the absence of whitespace between words. Why this mattered Formatting bugs around emphasis may look cosmetic, but they affect readability, hierarchy, and trust especially in instructional documentation where emphasis often signals warnings, key concepts, or required steps. What changed in practice Before: Emphasis markers could render inconsistently when adjacent to CJK characters Bold and italic formatting could break depending on the Markdown renderer Fixes risked overmatching and corrupting code or inline technical content After: Emphasis rendering is deterministic across CJK languages using HTML tags Bold and italic formatting remains consistent regardless of renderer behavior Normalization is safely scoped, avoiding unintended mutations in code and inline content Next steps With the recent release, Co-op Translator now exposes a programmatic API that allows the translation pipeline to be executed directly from Python, not only through the CLI. This is an important step, but it is not the end state. The immediate focus is improving adoption. Documentation and usage patterns are being developed so that the API can be reliably integrated across different environments and workflows. More fundamentally, the direction is shifting. Co-op Translator is evolving from a repository-specific tool into a reusable translation engine that can operate as part of larger content pipelines. This enables broader use cases, including: Long-form content such as eBooks and technical blogs Developer documentation and static site projects (for example, Docusaurus or Astro) Continuous documentation pipelines that track and update translations as source content evolves Multilingual SDK, API documentation, and knowledge base systems The long-term goal is to treat translation as infrastructure rather than a one-time task. Instead of generating static outputs, the system is being designed to support continuous updates, structural guarantees, and seamless integration into real-world documentation workflows. Why community feedback mattered so much here One of the most encouraging parts of this work is that the most useful reports were not always long reports. Sometimes a single repository link, a screenshot, and one concrete example of broken output were enough to reveal a structural weakness in the translation engine. That feedback created a valuable loop between people reading translated docs and people maintaining the translation tooling. Hiroshi's reports did not just identify isolated defects. They helped surface recurring categories of failure: code fence integrity chunk boundary safety link preservation CJK emphasis compatibility image path migration anchor normalization Once those patterns became visible, the fixes could be implemented in the core and covered with tests so that the broader ecosystem not just one file or one repo would benefit. Why this matters for learners worldwide Co-op Translator is used in educational repositories where translated documentation can lower the barrier to learning for people around the world. That raises the quality bar. A learner should not have to wonder whether a missing bold marker changed the meaning of a sentence. A learner should not hit a broken anchor halfway through a tutorial. A learner should not lose trust in a translated page because a code block or image path was corrupted during processing. Improving those details is not cosmetic. It is part of making global technical education more reliable. Closing thoughts This community report comes down to a simple truth: Translation quality depends on structural quality. Community feedback helped Co-op Translator get better at preserving the things technical documents depend on most: code fences, lists, links, emphasis, images, and anchors. The result is a more dependable foundation for multilingual documentation not only for Japanese, but for any repository that needs translated content to behave like a maintained technical artifact rather than a plain text dump. To everyone who has opened an issue, shared a screenshot, submitted a PR, or stress-tested translated docs in the real world: thank you. That feedback is helping Co-op Translator become a stronger tool for maintainers and a more trustworthy experience for learners. If you are maintaining multilingual Markdown content, I hope these lessons are useful beyond this project too: use parsers where you can, make structure a first-class concern, and treat community bug reports as design input not just support tickets. If you are working on multilingual documentation, you can explore Co-op Translator here: https://github.com/Azure/co-op-translator Selected GitHub references Repository: https://github.com/Azure/co-op-translator Issue #221: https://github.com/Azure/co-op-translator/issues/221 PR #226: https://github.com/Azure/co-op-translator/pull/226 Issue #234: https://github.com/Azure/co-op-translator/issues/234 PR #237: https://github.com/Azure/co-op-translator/pull/237 Issue #235: https://github.com/Azure/co-op-translator/issues/235 Issue #239: https://github.com/Azure/co-op-translator/issues/239 Issue #357: https://github.com/Azure/co-op-translator/issues/357 Issue #362: https://github.com/Azure/co-op-translator/issues/362 Issue #363: https://github.com/Azure/co-op-translator/issues/363 PR #370: https://github.com/Azure/co-op-translator/pull/370 PR #372: https://github.com/Azure/co-op-translator/pull/372 PR #377: https://github.com/Azure/co-op-translator/pull/377 PR #378: https://github.com/Azure/co-op-translator/pull/378 PR #379: https://github.com/Azure/co-op-translator/pull/379 PR #364: https://github.com/Azure/co-op-translator/pull/364 About the authors Minseok Song (Microsoft MVP) is an OSS maintainer of Co-op Translator focusing on GitHub-native multilingual automation. Hiroshi Yoshioka (Microsoft MVP) is a community contributor who has played a key role in improving translation quality through detailed real-world feedback.Making Sense of Azure AI Foundry IQ
As enterprise teams build AI agents, the hardest design decisions often have nothing to do with models. Instead, they revolve around a more fundamental question: How should an agent access organizational knowledge in a way that is accurate, secure, and sustainable over time? Azure AI Foundry IQ is designed to address a specific version of that problem. It is not a general‑purpose data access layer, and it is not a replacement for every retrieval pattern. Understanding where it fits and where it does not is key to using it effectively. This post explores those boundaries and grounds them in concrete, enterprise‑relevant scenarios, before showing how Foundry IQ can be implemented directly via Azure AI Search APIs and SDKs. What Azure AI Foundry IQ Is (and Is Not): Azure AI Foundry IQ is a managed knowledge layer built on Azure AI Search. It allows you to define a knowledge base that spans multiple content sources such as SharePoint, Azure Blob Storage, OneLake, existing Azure AI Search indexes, and selected external sources and expose them through a single, permission‑aware endpoint. When an agent queries a knowledge base, Foundry IQ: Plans how the query should be executed Selects relevant knowledge sources Runs retrieval (optionally in multiple steps) Enforces user permissions Returns grounded results with citations A single knowledge base can be reused across multiple agents or applications, avoiding duplicated indexing and inconsistent retrieval logic. What Foundry IQ is not: It does not execute SQL queries, perform aggregations, or provide real‑time numeric accuracy. Foundry IQ retrieves unstructured text, not transactional or analytical data. Where Foundry IQ Is a Good Fit 1. Multi‑Source, Distributed Knowledge Foundry IQ is most valuable when relevant knowledge is spread across multiple systems. It removes the need for each agent to manage source‑specific routing and retrieval logic. This benefit increases as the number of sources grows; with a single source, the overhead is rarely justified. 2. Complex or Multi‑Part Questions Foundry IQ’s agentic retrieval model is designed for questions that require: Decomposition into sub‑questions Retrieval from multiple documents Synthesis across sources Its multi‑step retrieval approach is especially effective when a single document cannot answer the question on its own. 3. Reduced Custom Retrieval Engineering Foundry IQ automates indexing, chunking, vectorization, and orchestration across sources. This makes it a strong choice for teams that want to focus on agent behavior rather than building and maintaining custom RAG pipelines. 4. Enterprise Security and Governance Foundry IQ integrates with Microsoft Entra ID and supports document‑level permissions and Purview sensitivity labels where the underlying source allows it. This makes it suitable for internal or regulated scenarios where permission trimming is a hard requirement. 5. Shared Knowledge Across Multiple Agents A single knowledge base can serve multiple agents or applications, reducing operational overhead and ensuring consistent retrieval behavior across experiences. 6. High Emphasis on Answer Quality and Trust For scenarios where correctness, grounding, and citations matter more than latency or cost, Foundry IQ’s multi‑step retrieval consistently outperforms basic RAG approaches. Example Scenarios Where Foundry IQ Works Well Scenario A: Internal Policy and Operations Assistant An enterprise builds an internal assistant for store managers. Relevant information lives in: • HR policies in SharePoint • Safety procedures in Blob Storage • Operations manuals in OneLake Questions often span multiple documents. A single Foundry IQ knowledge base unifies these sources and enforces permissions automatically. Scenario B: Compliance or Regulatory Knowledge Assistant A compliance team needs answers strictly grounded in approved documents, with citations and access control. Foundry IQ ensures only authorized content is retrieved, reducing the risk of accidental data exposure. Scenario C: Shared Knowledge Layer for Multiple Internal Agents Multiple internal agents like chat assistants, workflow helpers, embedded copilots rely on the same procedural content. A shared knowledge base avoids duplicate indexing and centralizes governance. Where Foundry IQ Is Not a Good Fit 1. Simple or Single‑Source Q&A For a single, well‑defined source, Foundry IQ’s orchestration adds complexity without proportional benefit. 2. Structured or Analytical Data Queries Foundry IQ does not execute live queries or calculations. It retrieves text, not metrics. 3. Ultra‑Low Latency or High‑Throughput Requirements Agentic retrieval introduces LLM‑in‑the‑loop latency and token costs. For sub‑second responses at scale, simpler retrieval pipelines are more appropriate. 4. Highly Customized Retrieval Logic Foundry IQ abstracts the retrieval pipeline. If you require fine‑grained control over scoring or transformations, a fully custom search pipeline may be preferable. Example Scenarios Where Foundry IQ Is the Wrong Tool Scenario D: Sales and Inventory Analytics Agent Questions like “What were Q4 sales by region?” require live data queries. Indexing reports leads to stale answers. A direct SQL or analytics tool is the correct solution. Scenario E: High‑Volume, Low‑Latency Assistant Voice‑based assistants requiring sub‑second responses cannot tolerate the latency of agentic retrieval. A Common Architecture Pattern Most successful implementations combine: Foundry IQ for unstructured documents and policies Structured data tools for analytics and live queries An application or agent layer that routes questions based on intent This avoids forcing a single tool to solve every problem. Querying Foundry IQ Knowledge Bases Directly via Azure AI Search SDK You can query Azure AI Foundry IQ knowledge bases directly using the azure-search-documents Python SDK without using Foundry Agent Service. Your App → Azure AI Search SDK → Foundry IQ Knowledge Base → Grounded Results Ideal when you want full orchestration control while still benefiting from managed, agentic retrieval. How this works Note:It is a reference implementation Install pip install --pre azure-search-documents azure-identity Setup (High Level) Provision Azure AI Search (Basic or higher) Enable Azure AD and API key authentication Enable a system‑assigned managed identity Ingest Content via Knowledge Sources Blob Storage, SharePoint, or OneLake Index, indexer, data source, and skillset are created automatically Knowledge sources and KBs are created via REST API (2025‑11‑01‑preview) Create a Knowledge Base minimal reasoning → semantic retrieval only (no LLM) low / medium reasoning → requires Azure OpenAI model Search service MI needs Cognitive Services User Querying the Knowledge Base (Python) Initialize the Client from azure.identity import DefaultAzureCredential from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient client = KnowledgeBaseRetrievalClient( endpoint="https://<search-service>.search.windows.net", knowledge_base_name="<kb-name>", credential=DefaultAzureCredential(), ) Minimal Reasoning (Fast, No LLM) from azure.search.documents.knowledgebases.models import ( KnowledgeBaseRetrievalRequest, KnowledgeRetrievalSemanticIntent, KnowledgeRetrievalMinimalReasoningEffort, KnowledgeRetrievalOutputMode, ) request = KnowledgeBaseRetrievalRequest( intents=[KnowledgeRetrievalSemanticIntent(search="your question here")], retrieval_reasoning_effort=KnowledgeRetrievalMinimalReasoningEffort(), output_mode=KnowledgeRetrievalOutputMode.EXTRACTIVE_DATA, ) response = client.retrieve(retrieval_request=request) Conversational Reasoning (LLM‑Backed) from azure.search.documents.knowledgebases.models import ( KnowledgeBaseRetrievalRequest, KnowledgeBaseMessage, KnowledgeBaseMessageTextContent, KnowledgeRetrievalLowReasoningEffort, KnowledgeRetrievalOutputMode, ) request = KnowledgeBaseRetrievalRequest( messages=[ KnowledgeBaseMessage( role="user", content=[KnowledgeBaseMessageTextContent(text="<first user question>")] ), KnowledgeBaseMessage( role="assistant", content=[KnowledgeBaseMessageTextContent(text="<assistant response>")] ), KnowledgeBaseMessage( role="user", content=[KnowledgeBaseMessageTextContent(text="<follow-up question>")] ), ], retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort(), output_mode=KnowledgeRetrievalOutputMode.EXTRACTIVE_DATA, ) response = client.retrieve(retrieval_request=request) Keep in mind: intents → minimal reasoning only messages → low / medium reasoning only They are not interchangeable. Processing the Response # Extracted content for msg in (response.response or []): for item in (msg.content or []): print(item.text) # Citations (handles blob, SharePoint, OneLake, and search index references) for ref in (response.references or []): ref_id = getattr(ref, "id", None) url = getattr(ref, "blob_url", None) or getattr(ref, "url", None) print(f"[{ref_id}] {url}") # Retrieval diagnostics for record in (response.activity or []): elapsed = getattr(record, "elapsed_ms", None) or "" print(f"{record.type}: {elapsed}ms") Output Modes Mode When to Use extractiveData Feed grounded chunks into your own LLM answerSynthesis Return a ready‑made answer with citations (LLM required) Security & Permissions RBAC: Search Index Data Reader with DefaultAzureCredential Permission trimming Must be enabled at ingestion (ingestionPermissionOptions) Enforced at query time by passing the user’s bearer token response = client.retrieve( retrieval_request=request, x_ms_query_source_authorization="Bearer <user-token>" ) Foundry IQ won't solve every retrieval problem. But when your agents need grounded, permission-aware answers from content scattered across SharePoint, Blob Storage, and OneLake, it handles the hard parts — so you can focus on what your agent actually does.Building an Enterprise Knowledge Copilot with Foundry IQ and Agentic Retrieval on Azure AI
Every enterprise has the same problem: knowledge scattered across SharePoint, file shares, wikis, and email. This article walks through building a knowledge copilot that unifies that data behind a single conversational interface — using Microsoft's Foundry IQ knowledge bases and the agentic retrieval engine in Azure AI Search. The Problem: Fragmented Knowledge, Fragmented Answers Enterprise AI projects today share a common pain point. Each new agent or copilot that needs to answer questions from company data must rebuild its own retrieval pipeline from scratch — data connections, chunking logic, embeddings, routing, permissions — all duplicated project after project. The result is a tangle of fragmented, siloed pipelines that are expensive to maintain and inconsistent in quality. Consider a field technician troubleshooting equipment. The answer might span a vendor manual stored in OneLake, a company repair policy on SharePoint, and a public electrical standard on the web. Traditional single-index RAG cannot orchestrate across those sources in one pass. The technician waits, the issue escalates, and productivity drops. Foundry IQ, announced in public preview in November 2025, addresses this directly. It provides a unified knowledge layer for agents — a single endpoint that replaces per-project RAG pipelines with a reusable, topic-centric knowledge base that any number of agents can consume. What Is Foundry IQ? Foundry IQ introduces four capabilities built on top of Azure AI Search: Knowledge Bases — Reusable, topic-centric collections (e.g., "employee policies," "product documentation") available directly in the Foundry portal. Rather than wiring retrieval logic into every agent, you define a knowledge base once and ground multiple agents through a single API. Indexed and Federated Knowledge Sources — A knowledge base can draw from Azure Blob Storage, OneLake, SharePoint, Azure AI Search indexes, the web, and MCP servers (MCP in private preview). Developers do not need to manage different retrieval strategies per source; the knowledge base presents a unified endpoint. Agentic Retrieval Engine — A self-reflective query engine that uses AI to plan, search, and synthesize answers with configurable retrieval reasoning effort. Enterprise-Grade Security — Document-level access control and alignment with existing permissions models. Microsoft Purview sensitivity labels are respected through the indexing and retrieval pipeline, so classified content remains governed as it flows into knowledge bases. For indexed sources, Foundry IQ automatically manages the full indexing pipeline: content is ingested, chunked, vectorized, and prepared for hybrid retrieval. When Azure Content Understanding is enabled, complex documents gain layout-aware enrichment — tables, figures, and headers are extracted and structured without extra engineering work. How Agentic Retrieval Works Single-shot RAG — one query, one index, one pass — breaks down when questions are ambiguous, multi-hop, or span several data silos. Foundry IQ's agentic retrieval engine treats retrieval as a multi-step reasoning task rather than a keyword lookup: Plan — The engine analyzes the conversation and decomposes the query into focused sub-queries, deciding which knowledge sources to consult. Search — Sub-queries run concurrently against selected sources using keyword, vector, or hybrid techniques. Rank — Semantic reranking identifies the most relevant results. Reflect — If the information gathered is insufficient, the engine iterates — issuing follow-up queries autonomously. Synthesize — Results are unified into a natural-language answer with source references. Developers control this behaviour through a high-level retrieval reasoning effort setting. Lower effort suits fast, lightweight lookups; higher effort enables iterative search and richer planning across the entire data estate. Real-world impact: AT&T integrated Azure AI Search and retrieval-augmented generation into its multi-agent framework, reducing customer resolution times by 33 percent, cutting average handle time by nearly 10 percent, and scaling 71 AI solutions to 100,000 employees. Ontario Power Generation used agentic retrieval to sift through over 40 years of nuclear operating experience, enabling data-driven decision-making and helping new staff learn from decades of institutional knowledge. Architecture Overview Step-by-Step: Setting Up the Knowledge Copilot Provision Resources You need an Azure AI Search service (Basic tier or above), a Microsoft Foundry project, an embedding model deployment (e.g., text-embedding-3-large), and an LLM deployment (e.g., gpt-4.1) for query planning and answer generation. .NET 8 or later is required for the C# SDK. Create a Knowledge Base in Azure AI Search Using the Azure.Search.Documents preview SDK, define an index, a knowledge source pointing to your data, and a knowledge base with OutputMode set to AnswerSynthesis for natural-language answers with citations. The following C# snippet (adapted from the official Azure AI Search quickstart) shows the knowledge base creation: using Azure; using Azure.Identity; using Azure.Search.Documents.Indexes; var searchEndpoint = "https://<your-service>.search.windows.net"; var aoaiEndpoint = "https://<your-resource>.openai.azure.com/"; var indexClient = new SearchIndexClient( new Uri(searchEndpoint), new DefaultAzureCredential()); // Configure the LLM for query planning and answer synthesis var openAiParameters = new AzureOpenAIVectorizerParameters { ResourceUri = new Uri(aoaiEndpoint), DeploymentName = "gpt-4.1", ModelName = "gpt-4.1" }; var model = new KnowledgeBaseAzureOpenAIModel(openAiParameters); // Create the knowledge base with answer synthesis enabled var knowledgeBase = new KnowledgeBase("<knowledge-base-name>") { OutputMode = KnowledgeBaseOutputMode.AnswerSynthesis, AnswerInstructions = "Provide a concise answer based on the retrieved documents.", Models = { model } }; await indexClient.CreateOrUpdateKnowledgeBaseAsync(knowledgeBase); Connect an Agent to the Knowledge Base via MCP Each knowledge base exposes a Model Context Protocol (MCP) endpoint that MCP-compatible agents can call. The Foundry IQ-specific agent SDK currently offers full code samples for Python and REST API, but you can use the general-purpose MCP tooling in C# to achieve the same connection. The following pattern is drawn from the official Microsoft Learn documentation on MCP tools with Foundry Agents: using Azure.AI.Projects; using Azure.Identity; var endpoint = "https://<your-resource>.services.ai.azure.com/api/projects/<your-project>"; var model = "gpt-4.1-mini"; // Point the MCP tool at the knowledge base's MCP endpoint var mcpTool = new MCPToolDefinition( serverLabel: "enterprise_kb", serverUrl: "https://<search-service>.search.windows.net" + "/knowledgebases/<kb-name>/mcp?api-version=2025-11-01-preview"); mcpTool.AllowedTools.Add("knowledge_base_retrieve"); // Create the agent with the MCP tool attached var projectClient = new AIProjectClient(new Uri(endpoint), new DefaultAzureCredential()); var agentVersion = await projectClient.AgentAdministrationClient .CreateAgentVersionAsync( "enterprise-copilot", new ProjectsAgentVersionCreationOptions( new DeclarativeAgentDefinition(model) { Instructions = "You are a company knowledge assistant. " + "Always search the knowledge base before answering. " + "If the knowledge base has no answer, say so clearly.", Tools = { mcpTool } })); The agent instructions are critical — explicitly requiring the agent to use the knowledge base prevents it from answering purely from the LLM's training data. Query the Copilot Once the agent is published, your application layer simply sends user questions via the Azure AI Projects SDK or REST API. The agent autonomously invokes the knowledge base tool, retrieves grounded context, and returns an answer with citations referencing the original documents. Trade-offs and Considerations Dimension Detail Maturity Foundry IQ is in public preview — not recommended for production workloads without accepting preview SLA terms. Cost Agentic retrieval has two billing streams: token-based billing from Azure AI Search for retrieval, and billing from Azure OpenAI for query planning and answer synthesis. Latency vs. Quality Higher retrieval reasoning effort produces better answers but adds latency due to iterative search. For sub-second lookups, use minimal effort; for complex multi-hop questions, use medium. C# SDK Coverage The Foundry IQ–specific agent connection SDK currently supports Python and REST API. C# support is available for the underlying agentic retrieval queries and for general MCP tool integration. Security Document-level ACLs from SharePoint are enforced at query time. For per-user authorization in Foundry Agent Service, the current preview does not support per-request MCP headers — use the Azure OpenAI Responses API as an alternative. Key Takeaways Foundry IQ transforms enterprise RAG from a bespoke, per-project exercise into a managed, reusable knowledge layer. You define a knowledge base once, connect it to your data sources, and any number of agents or apps can consume it. The agentic retrieval engine handles query planning, multi-source search, semantic reranking, and iterative refinement — capabilities that previously required significant custom engineering. For .NET developers, the Azure AI Search C# SDK and the MCP tooling in the Agent Framework provide the building blocks to integrate this into your applications today. References: What is Foundry IQ? Create a knowledge base in Azure AI Search Foundry IQ: Unlocking ubiquitous knowledge for agentsThe Future of Agentic AI: Inside Microsoft Agent Framework 1.0
Agentic AI is rapidly moving beyond demos and chatbots toward long‑running, autonomous systems that reason, call tools, collaborate with other agents, and operate reliably in production. On April 3, 2026, Microsoft marked a major milestone with the General Availability (GA) release of Microsoft Agent Framework 1.0, a production‑ready, open‑source framework for building agents and multi‑agent workflows in.NET and Python. [techcommun...rosoft.com] In this post, we’ll deep‑dive into: What Microsoft Agent Framework actually is Its core architecture and design principles What’s new in version 1.0 How it differs from other agent frameworks When and how to use it—with real code examples What Is Microsoft Agent Framework? According to the official announcement, Microsoft Agent Framework is an open‑source SDK and runtime for building AI agents and multi‑agent workflows with strong enterprise foundations. Agent Framework provides two primary capability categories: 1. Agents Agents are long‑lived runtime components that: Use LLMs to interpret inputs Call tools and MCP servers Maintain session state Generate responses They are not just prompt wrappers, but stateful execution units. 2. Workflows Workflows are graph‑based orchestration engines that: Connect agents and functions Enforce execution order Support checkpointing and human‑in‑the‑loop scenarios This leads to a clean separation of responsibilities: Concern Handled By Reasoning & interpretation Agent Execution policy & control flow Workflow This separation is a foundational design decision. High‑Level Architecture From the official overview, Agent Framework is composed of several core building blocks: Model clients (chat completions & responses) Agent sessions (state & conversation management) Context providers (memory and retrieval) Middleware pipeline (interception, filtering, telemetry) MCP clients (tool discovery and invocation) Workflow engine (graph‑based orchestration) Conceptual Flow 🌟 What’s New in Version 1.0 Version 1.0 marks the transition from "Release Candidate" to "General Availability" (GA). Production-Ready Stability: Unlike the earlier experimental packages, 1.0 offers stable APIs, versioned releases, and a commitment to long-term support (LTS). A2A Protocol (Agent-to-Agent): A new structured messaging protocol that allows agents to communicate across different runtimes. For example, an agent built in Python can seamlessly coordinate with an agent running in a .NET environment. MCP (Model Context Protocol) Support: Full integration with the Model Context Protocol, enabling agents to dynamically discover and invoke external tools and data sources without manual integration code. Multi-Agent Orchestration Patterns: Stable implementations of complex patterns, including: Sequential: Linear handoffs between specialized agents. Group Chat: Collaborative reasoning where agents discuss and solve problems. Magentic-One: A sophisticated pattern for task-oriented reasoning and planning. Middleware Pipeline: The new middleware architecture lets you inject logic into the agent's execution loop without modifying the core prompts. This is essential for Responsible AI (RAI), allowing you to add content safety filters, logging, and compliance checks globally. DevUI Debugger: A browser-based local debugger that provides a real-time visual representation of agent message flows, tool calls, and state changes. Code Examples Creating a Simple Agent (C#) From Microsoft Learn : using Azure.AI.Projects; using Azure.Identity; using Microsoft.Agents.AI; AIAgent agent = new AIProjectClient( new Uri("https://your-foundry-service.services.ai.azure.com/api/projects/your-project"), new AzureCliCredential()) .AsAIAgent( model: "gpt-5.4-mini", instructions: "You are a friendly assistant. Keep your answers brief."); Console.WriteLine(await agent.RunAsync("What is the largest city in France?")); This shows: Provider‑agnostic model access Session‑aware agent execution Minimal setup for production agents Creating a Simple Agent (Python) from agent_framework.foundry import FoundryChatClient from azure.identity import AzureCliCredential client = FoundryChatClient( project_endpoint="https://your-foundry-service.services.ai.azure.com/api/projects/your-project", model="gpt-5.4-mini", credential=AzureCliCredential(), ) agent = client.as_agent( name="HelloAgent", instructions="You are a friendly assistant. Keep your answers brief.", ) result = await agent.run("What is the largest city in France?") print(result) The same agent abstraction applies across languages. When to Use Agents vs Workflows Microsoft provides clear guidance: Use an Agent when… Use a Workflow when… Task is open‑ended Steps are well‑defined Autonomous tool use is needed Execution order matters Single decision point Multiple agents/functions collaborate Key principle: If you can solve the task with deterministic code, do that instead of using an AI agent. 🔄 How It Differs from Other Frameworks Microsoft Agent Framework 1.0 distinguishes itself by focusing on "Enterprise Readiness" and "Interoperability." Feature Microsoft Agent Framework 1.0 Semantic Kernel / AutoGen LangChain / CrewAI Philosophy Unified, production-ready SDK. Research-focused or tool-specific. High-level, developer-friendly abstractions. Integration Deeply integrated with Microsoft Foundry and Azure. Varied; often requires more glue code. Generally cloud-agnostic. Interoperability Native A2A and MCP for cross-framework tasks. Limited to internal ecosystem. Uses proprietary connectors. Runtime Identical API parity for .NET and Python. Primarily Python-first (SK has C#). Primarily Python. Control Graph-based deterministic workflows. More non-deterministic/experimental. Mixture of role-based and agentic. 🛠️ Key Technical Components Agent Harness: The execution layer that provides agents with controlled access to the shell, file system, and messaging loops. Agent Skills: A portable, file-based or code-defined format for packaging domain expertise. Implementation Tip: If you are coming from Semantic Kernel, Microsoft provides migration assistants that analyze your existing code and generate step-by-step plans to upgrade to the new Agent Framework 1.0 standards. Microsoft Agent Framework Version 1.0 | Microsoft Agent Framework Agent Framework documentation 🎯 Summary Microsoft Agent Framework 1.0 is the "grown-up" version of AI orchestration. By standardizing the way agents talk to each other (A2A), discover tools (MCP), and process information (Middleware), Microsoft has provided a clear path for taking AI experiments into production. For more detailed guides, check out the official Microsoft Agent Framework DocumentationMicrosoft Agent Framework - .NET AI Community StandupNeed Guidance on cost breakdown of Microsoft Foundry Agent portal I created
I have developed a complaint handling portal for customers and employees using Azure AI Foundry. The solution is built with Foundry agents, models from the catalog, input/output caching, agent logging/tracing, and other Foundry capabilities. The frontend and orchestration layer are deployed on Azure Container Apps. While Azure Cost Analysis provides an overview of spending, several parts remain unclear or act as a black box for accurate estimation, including: Token consumption assumptions (input/output tokens across different models and agents) User concurrency, sessions, and behavior patterns Agent logging and observability costs Impact of input/output caching Detailed resource consumption and billing in Azure Container Apps What is the best way to accurately calculate or estimate the total running cost for such an Azure AI Foundry-based platform with Container Apps frontend? Are there official Microsoft documentation, pricing guides, or reference architectures for cost breakdown? How do companies typically present costs for such AI platforms to attract customers (e.g., TCO models or per-user pricing)? I want to know how the platform costs are shown to customers. Thank you.Failed to add tool to agent - Preview Feature Required?
Hi, We’ve recently run into an issue where we’re no longer able to add tools to our Foundry agent. This was previously working without problems in our development environment, but now every attempt results in the following error: “Failed to add tool to agent Request failed with status code 403.” After inspecting the request in the browser’s developer console, we noticed an additional message: "This operation requires the following opt-in preview feature(s): AgentEndpoints=V1Preview. Include the 'Foundry-Features: AgentEndpoints=V1Preview' header in your request." How can we opt in for this foundry preview feature? and when was this change introduced? We are unsure if the issue is related the the preview feature missing, or some other forbidden issue. Any help would be very much appreciated. Kind regards, Arne217Views1like1CommentWhen Anthropic’s Managed Agents Meet Microsoft Hosted Agents
Let’s start from a real engineering pain point. In their engineering blog Managed Agents, Anthropic describes a sobering observation: while building the agent scaffolding (the “harness”) for Claude Sonnet 4.5, they noticed the model suffered from “context anxiety,” so they added context-reset logic into the harness. But when the same harness ran on the more capable Claude Opus 4.5, those resets became dead weight — the stronger model no longer needed them, yet the harness was actively holding it back. This is the fundamental dilemma of the harness: it encodes assumptions about the current model’s capabilities, and those assumptions rot quickly as models evolve. That’s not a minor concern. In an era when AI capabilities shift qualitatively every few months, any infrastructure tightly coupled to a specific model’s abilities becomes a bottleneck on engineering progress. Anthropic’s Answer: A Three-Part Decoupled Architecture Anthropic’s answer borrows from a problem operating systems solved decades ago: how do you provide stable abstractions for programs that haven’t been imagined yet? The answer is virtualization. Just as the OS virtualizes physical hardware into stable abstractions — processes, files, sockets — Managed Agents virtualizes the Agent runtime into three independent interface layers. For readability this post follows Anthropic’s own metaphors — Brain / Hands / Session — which map to the more conventional engineering terms reasoning orchestrator / execution sandbox / durable event log: Note: Brain, Hands, and Session are not standard industry terminology — they are metaphors Anthropic uses in its engineering blog. In more conventional engineering vocabulary they correspond, respectively, to the agent reasoning loop / orchestrator, the tool executor / execution sandbox, and the durable event log / state store. The rest of this post uses the two styles interchangeably. Brain (the reasoning orchestrator): a stateless reasoning loop This layer is the harness itself — it calls the model and routes tool calls; think of it as the agent’s reasoning orchestrator. Key design point: it must be stateless. All its state comes from the event log; as long as it can call wake(sessionId) to resume, any harness crash is recoverable. This means the harness can evolve independently as model capabilities evolve, without disturbing in-flight tasks. Hands (the execution sandbox layer): replaceable execution sandboxes This layer holds the execution environments the orchestrator calls into — Python REPLs, shells, HTTP clients, even remote containers — i.e. the tool executor / execution sandbox. The contract is brutally simple: execute(name, input) -> string Just that one interface. The orchestrator doesn’t care whether a sandbox is a local process or a remote container; if a sandbox crashes, it is treated as an ordinary tool error — the model decides whether to retry on a fresh sandbox. This is the “cattle, not pets” philosophy applied to AI engineering . Session (the durable event log): externalized memory This layer is an append-only, durable event log. It is not the model’s context window. This distinction matters enormously. When a task outgrows the context window, the harness can use getEvents(start, end) to slice history on demand, and filter, summarize, or transform it before feeding back to the model — all without changing the underlying interface. The event log also plays a key role in credential isolation: when execute calls are logged, the Vault redacts first, so raw tokens never enter the log — and never enter the model’s context window. Performance gains This decoupling yields measurable wins: Median time-to-first-token (TTFT) down ~60% P95 latency improved by more than 90% The reason: the old architecture had to provision a container before inference could begin. After decoupling, inference can start as soon as the event log is readable, with sandboxes provisioned lazily on demand Microsoft’s Answer: Foundry Agent Service and Hosted Agents Microsoft gives the enterprise-grade infrastructure answer in Microsoft Foundry. Foundry Agent Service offers three Agent types: Type Requires Code? Hosting Best For Prompt Agent No Fully managed Rapid prototyping Workflow Agent No (optional YAML) Fully managed Multi-step automation Hosted Agent Yes Containerized hosting Fully custom logic This post focuses on Hosted Agents. They let developers package their own Agent code (LangGraph, Microsoft Agent Framework, or fully custom) as a container image and deploy it on Microsoft’s fully managed, pay-per-use infrastructure. Hosting Adapter: the key abstraction The core abstraction in Hosted Agents is the Hosting Adapter. It does three things: Local testing: starts an HTTP server at localhost:8088; no containerization needed for local runs. Protocol translation: automatically converts between Foundry’s Responses API format and Agent Framework’s native data structures. Observability: plugs into OpenTelemetry and exports traces, metrics, and logs to Azure Monitor. Microsoft Agent Framework: the model-agnostic orchestration layer Microsoft Agent Framework (9.7k , now generally available) is a multi-language, multi-provider Agent orchestration framework that supports: Azure OpenAI, OpenAI, GitHub Copilot Anthropic Claude AWS Bedrock, Ollama Protocol standards like A2A, AG-UI, MCP This matters a lot: Microsoft’s own Agent framework natively supports Anthropic’s Claude models, providing an official path for cross-ecosystem integration. This Project: Two Philosophies Shake Hands in Code Now let’s see how this real project fuses the two architectural philosophies. Project layout HostedAgentDemo/ ├── main.py # reasoning orchestrator (a.k.a. Brain): main agent loop ├── agent.yaml # Hosted Agent declaration ├── azure.yaml # azd deployment config ├── Dockerfile # containerization ├── harness/ │ ├── session.py # durable event log (a.k.a. Session) │ ├── sandbox.py # execution sandbox pool (a.k.a. Hands) │ └── vault.py # credential vault └── requirements.txt Reasoning orchestrator (a.k.a. Brain): FoundryChatClient + Agent Framework # main.py (excerpt) from agent_framework import Agent from agent_framework.foundry import FoundryChatClient from agent_framework_foundry_hosting import ResponsesHostServer async with DefaultAzureCredential() as credential: client = FoundryChatClient( project_endpoint=PROJECT_ENDPOINT, model=MODEL_DEPLOYMENT_NAME, credential=credential, allow_preview=True, ) agent = Agent( client, instructions=INSTRUCTIONS, name="ManagedStyleAgent", tools=[execute, list_tools, get_events, emit_note], ) server = ResponsesHostServer(agent) await server.run_async() What’s happening here? FoundryChatClient: the Foundry model client from Microsoft Agent Framework; talks to a model deployed on Microsoft Foundry. Agent: the stateless reasoning orchestrator (what Anthropic calls the “Brain”), with a fixed toolset of four: execute, list_tools, get_events, emit_note. ResponsesHostServer: the Hosting Adapter; exposes the Agent as an HTTP service compatible with Foundry’s Responses API. The orchestrator’s toolset follows Anthropic Managed Agents’ minimalism strictly — every capability funnels through the single gateway execute(name, input_json); the reasoning layer knows nothing about concrete sandbox implementations. Execution sandbox layer (a.k.a. Hands): a cattle-style sandbox pool # harness/sandbox.py (core logic) def execute(self, name: str, input: dict[str, Any]) -> str: """The one and only contract between the orchestrator and the sandboxes.""" if name not in self._tools: return f"ERROR: unknown tool '{name}'. Available: {self.list_tools()}" sandbox_id = self.provision(kind=name) try: out = self._tools[name](input or {}, self._vault) out = self._vault.redact(out) # redact credentials return out except Exception as e: return f"ERROR: sandbox '{sandbox_id}' failed: {type(e).__name__}: {e}" finally: self.retire(sandbox_id) # forcibly destroy after every call Look at the finally block: every execute call destroys its sandbox afterwards, success or failure. That guarantees sandboxes are genuinely stateless units — leftover processes, temp files, in-memory state all vanish with the sandbox. Built-in sandboxes include: python_exec: isolated Python subprocess (15s timeout, no leaked env vars) shell_exec: argv-list execution (no shell metacharacter injection) http_fetch: auth headers injected via the Vault proxy Durable event log (a.k.a. Session): externalized memory # harness/session.py (core interface) class SessionStore: def emit_event(self, session_id, type, payload) -> SessionEvent: """Append-only — never overwritten, never deleted.""" def get_events(self, session_id, start=0, end=None) -> list[SessionEvent]: """Positional slice. The harness can transform before passing to the model.""" def wake(self, session_id) -> list[SessionEvent]: """Recovery entry point after harness crash.""" The event log is a .jsonl append-only file — one JSON event per line. In production you can drop in Azure Cosmos DB, Event Hub, or any durable store; the interface doesn’t change. Vault: credentials never touch the model # harness/vault.py class CredentialVault: def build_auth_headers(self, logical_name: str) -> dict[str, str]: token = self.resolve(logical_name) return {"Authorization": f"Bearer {token}"} def redact(self, value: Any) -> Any: """Replace every known secret in logs and tool return values.""" s = str(value) for secret in self._secrets.values(): if secret and secret in s: s = s.replace(secret, "***REDACTED***") return s The model references credentials by logical name: execute("http_fetch", {"url": "...", "credential": "github"}) — it only knows the logical name "github". The real token is injected by the Vault inside the sandbox, and tool return values are redacted before being written to the event log. Deploying: From Local to Azure in One Command # 1. Install the azd Agent extension azd ext install azure.ai.agents # 2. Test locally (no container needed) python main.py # → Managed-style Agent running on http://localhost:8088 # 3. Deploy to Azure (Provision + Build + Deploy) azd up Architectures Compared: Two Ecosystems in Philosophical Resonance Dimension Anthropic Managed Agents Microsoft Foundry Hosted Agents Core abstraction Reasoning orchestrator / execution sandbox / durable event log (Anthropic: Brain / Hands / Session) Hosting Adapter + Agent Framework Sandbox strategy Cattle (destroyed after use) Container (managed lifecycle) Credential security Vault proxy injection, invisible to the model Managed Identity + RBAC Context management External event log, sliced on demand Responses API session management Observability Event log + custom OpenTelemetry → Azure Monitor Scaling Many orchestrators × many sandboxes, concurrent minReplicas / maxReplicas Cross-model support Claude model family Many providers (Claude included) The core philosophy of both architectures aligns tightly: decouple reasoning (the orchestrator), tool execution (the sandbox layer), and memory (the event log) so each layer can evolve independently. The difference is emphasis: Anthropic prioritizes interface stability — so today’s infrastructure can run tomorrow’s stronger models. Microsoft prioritizes enterprise-grade operations — so agents get production-grade security, scaling, and observability. That’s the value of this project: it proves the two philosophies can live together in one codebase. Running it # Clone and configure git clone <your-repo> cd HostedAgentDemo cp .env.example .env # Edit .env with FOUNDRY_PROJECT_ENDPOINT and MODEL_DEPLOYMENT_NAME # Install dependencies pip install -r requirements.txt # Run locally python main.py # Test a conversation curl http://localhost:8088/responses \ -H "Content-Type: application/json" \ -d '{"input": [{"role": "user", "content": "List the tools you can use"}]}' # Deploy to Azure azd ext install azure.ai.agents azd up Summary Back in 2016 the industry was still arguing whether microservices were over-engineering. Today nobody doubts the value of service decoupling at scale. I believe 2025–2026 is the “microservices moment” for Agent engineering — people are starting to realize that an Agent that couples reasoning, tool execution, and state memory inside a single monolithic container simply cannot keep pace with model evolution. Anthropic’s Managed Agents supplies the architectural philosophy; Microsoft’s Foundry Hosted Agents supplies the enterprise infrastructure; and this open-source project shows that they are not an either/or choice — they are complementary, and they make each other better. References Sample Code https://github.com/microsoft/Agent-Framework-Samples/tree/main/09.Cases/maf_harness_managed_hosted_agent Anthropic Engineering Blog. Managed Agents https://www.anthropic.com/engineering/managed-agents Microsoft Learn – Hosted agents in Foundry Agent Service (preview) https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/hosted-agents?view=foundry Microsoft Foundry Samples. agent-framework Python hosted-agent samples https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/agent-framework Microsoft Agent Framework https://github.com/microsoft/agent-framework Microsoft Agent Framework Sample https://github.com/microsoft/agent-framework-samples Microsoft Learn – What is Microsoft Foundry Agent Service https://learn.microsoft.com/en-us/azure/foundry/agents/overview757Views0likes0CommentsStop Experimenting, Start Building: AI Apps & Agents Dev Days Has You Covered
The AI landscape has shifted. The question is no longer “Can we build AI applications?” it’s “Can we build AI applications that actually work in production?” Demos are easy. Reliable, scalable, resilient AI systems that handle real-world complexity? That’s where most teams struggle. If you’re an AI developer, software engineer, or solution architect who’s ready to move beyond prototypes and into production-grade AI, there’s a series built specifically for you. What Is AI Apps & Agents Dev Days? AI Apps & Agents Dev Days is a monthly technical series from Microsoft Reactor, delivered in partnership with Microsoft and NVIDIA. You can explore the full series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ This isn’t a slide deck marathon. The series tagline says it best: “It’s not about slides, it’s about building.” Each session tackles real-world challenges, shares patterns that actually work, and digs into what’s next in AI-driven app and agent design. You bring your curiosity, your code, and your questions. You leave with something you can ship. The sessions are led by experienced engineers and advocates from both Microsoft and NVIDIA, people like Pamela Fox, Bruno Capuano, Anthony Shaw, Gwyneth Peña-Siguenza, and solutions architects from NVIDIA’s Cloud AI team. These aren’t theorists; they’re practitioners who build and ship the tools you use every day. What You’ll Learn The series covers the full spectrum of building AI applications and agent-based systems. Here are the key themes: Building AI Applications with Azure, GitHub, and Modern Tooling Sessions walk through how to wire up AI capabilities using Azure services, GitHub workflows, and the latest SDKs. The focus is always on code-first learning, you’ll see real implementations, not abstract architecture diagrams. Designing and Orchestrating AI Agents Agent development is one of the series’ strongest threads. Sessions cover how to build agents that orchestrate long-running workflows, persist state automatically, recover from failures, and pause for human-in-the-loop input, without losing progress. For example, the session “AI Agents That Don’t Break Under Pressure” demonstrates building durable, production-ready AI agents using the Microsoft Agent Framework, running on Azure Container Apps with NVIDIA serverless GPUs. Scaling LLM Inference and Deploying to Production Moving from a working prototype to a production deployment means grappling with inference performance, GPU infrastructure, and cost management. The series covers how to leverage NVIDIA GPU infrastructure alongside Azure services to scale inference effectively, including patterns for serverless GPU compute. Real-World Architecture Patterns Expect sessions on container-based deployments, distributed agent systems, and enterprise-grade architectures. You’ll learn how to use services like Azure Container Apps to host resilient AI workloads, how Foundry IQ fits into agent architectures as a trusted knowledge source, and how to make architectural decisions that balance performance, cost, and scalability. Why This Matters for Your Day Job There’s a critical gap between what most AI tutorials teach and what production systems actually require. This series bridges that gap: Production-ready patterns, not demos. Every session focuses on code and architecture you can take directly into your projects. You’ll learn patterns for state persistence, failure recovery, and durable execution — the things that break at 2 AM. Enterprise applicability. The scenarios covered — travel planning agents, multi-step workflows, GPU-accelerated inference — map directly to enterprise use cases. Whether you’re building internal tooling or customer-facing AI features, the patterns transfer. Honest trade-off discussions. The speakers don’t shy away from the hard questions: When do you need serverless GPUs versus dedicated compute? How do you handle agent failures gracefully? What does it actually cost to run these systems at scale? Watch On-Demand, Build at Your Own Pace Every session is available on-demand. You can watch, pause, and build along at your own pace, no need to rearrange your schedule. The full playlist is available at This is particularly valuable for technical content. Pause a session while you replicate the architecture in your own environment. Rewind when you need to catch a configuration detail. Build alongside the presenters rather than just watching passively. What You’ll Walk Away Wit After working through the series, you’ll have: Practical agent development skills — how to design, orchestrate, and deploy AI agents that handle real-world complexity, including state management, failure recovery, and human-in-the-loop patterns Production architecture patterns — battle-tested approaches for deploying AI workloads on Azure Container Apps, leveraging NVIDIA GPU infrastructure, and building resilient distributed systems Infrastructure decision-making confidence — a clearer understanding of when to use serverless GPUs, how to optimise inference costs, and how to choose the right compute strategy for your workload Working code and reference implementations — the sessions are built around live coding and sample applications (like the Travel Planner agent demo), giving you starting points you can adapt immediately A framework for continuous learning — with new sessions each month, you’ll stay current as the AI platform evolves and new capabilities emerge Start Building The AI applications that will matter most aren’t the ones with the flashiest demos — they’re the ones that work reliably, scale gracefully, and solve real problems. That’s exactly what this series helps you build. Whether you’re designing your first AI agent system or hardening an existing one for production, the AI Apps & Agents Dev Days sessions give you the patterns, tools, and practical knowledge to move forward with confidence. Explore the series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ and start watching the on-demand sessions at the link above. The best time to level up your AI engineering skills was yesterday. The second-best time is right now and these sessions make it easy to start.Join our free livestream series on hosting agents in Microsoft Foundry
Join us for a new 3‑part livestream series where we deploy AI agents on Microsoft Foundry using Microsoft Agent Framework and LangChain/LangGraph, then level them up with tools, observability, and evals. You'll learn how to: Deploy Python agents to Foundry Hosted agents using the Azure Developer CLI Build hosted agents with Microsoft Agent Framework, including Foundry IQ integration Build hosted agents with LangChain + LangGraph, including built-in tools like Web Search Run quality and safety evaluations: continuous evals, scheduled evals, guardrails, and red-teaming Throughout the series, we’ll use Python for all examples and share full code so you can run everything yourself in your own Foundry projects. 👉 Register for the full series. Spanish speaker? ¡Tendremos una serie para hispanohablantes! Regístrese aquí In addition to the live streams, you can also join Join the Microsoft Foundry Discord to ask follow-up questions after each stream. If you are new to generative AI with Python, start with our 9-part Python + AI series, which covers topics such as LLMs, embeddings, RAG, tool calling, MCP, and agents. If you are new to Microsoft Agent Framework, watch our 6-part Python + Agent series which dives deep into agents and workflows. To learn more about each live stream or register for individual sessions, scroll down: Host your agents on Foundry: Microsoft Agent Framework 27 April, 2026 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our first session, we'll deploy agents built with Microsoft Agent Framework (the successor of Autogen and Semantic Kernel). Starting with a simple agent, we'll add Foundry tools like Code Interpreter, ground the agent in enterprise data with Foundry IQ, and finally deploy multi-agent workflows. Along the way, we'll use the Foundry UI to interact with the hosted agent, testing it out in the playground and observing the traces from the reasoning and tool calls. Host your agents on Foundry: LangChain + LangGraph 29 April, 2026 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our second session, we'll deploy agents built with the popular open-source libraries LangChain and LangGraph. Starting with a simple agent, we'll add Foundry tools like Bing Web Search, ground the agent in Foundry IQ, then deploy more complex agents using the LangGraph orchestration framework. Along the way, we'll use the Foundry UI to interact with the hosted agent, testing it out in the playground and observing the traces from the reasoning and tool calls. Host your agents on Foundry: Quality & safety evaluations 30 April, 2026 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our third session, we'll ensure that our AI agents are producing high-quality outputs and operating safely and responsibly. First we'll explore what it means for agent outputs to be high quality, using built-in evaluators to check overall task adherence and then building custom evaluators for domain-specific checks. With Foundry hosted agents, we can run bulk evaluations on demand, set up scheduled evaluations, and even enable continuous evaluation on a subset of live agent traces. Next we'll discuss safety systems that can be layered on top of agents and audit agents for potential safety risks. To improve compliance with an organization's goals, we can configure custom policies and guardrails that can be shared across agents. Finally, we can ensure that adversarial inputs can't produce unsafe outputs by running automated red-teaming scans on agents, and even schedule those to run regularly as well. With all of these evaluation and compliance features available in Foundry, you can have more confidence hosting your agents in production.