ai foundry
75 TopicsBuilding Knowledge-Grounded AI Agents with Foundry IQ
Foundry IQ now integrates with Foundry Agent Service via MCP (Model Context Protocol), enabling developers to build AI agents grounded in enterprise knowledge. This integration combines Foundry IQ’s intelligent retrieval capabilities with Foundry Agent Service’s orchestration, enabling agents to retrieve and reason over enterprise data. Key capabilities include: Auto-chunking of documents Vector embedding generation Permission-aware retrieval Semantic reranking Citation-backed responses Together, these capabilities allow AI agents to retrieve enterprise knowledge and generate responses that are accurate, traceable, and aligned with organizational permissions. Why Use Foundry IQ with Foundry Agent Service? Intelligent Retrieval Foundry IQ extends beyond traditional vector search by introducing: LLM-powered query decomposition Parallel retrieval across multiple sources Semantic reranking of results This enables agents to retrieve the most relevant enterprise knowledge even for complex queries. Permission-Aware Retrieval Agents only access content users are authorized to see. Access control lists from sources such as: SharePoint OneLake Azure Blob Storage are automatically synchronized and enforced at query time. Auto-Managed Indexing Foundry IQ automatically manages: Document chunking Vector embedding generation Indexing This eliminates the need to manually build and maintain complex ingestion pipelines. The Three Pillars of Foundry IQ 1. Knowledge Sources Foundry IQ connects to enterprise data wherever it lives — SharePoint, Azure Blob Storage, OneLake, and more. When you add a knowledge source: Auto-chunking — Documents are automatically split into optimal segments Auto-embedding — Vector embeddings are generated without manual pipelines Auto-ACL sync — Access permissions are synchronized from supported sources (SharePoint, OneLake) Auto-Purview integration — Sensitivity labels are respected from supported sources2. Knowledge Bases 2. Knowledge Bases A Knowledge Base unifies multiple sources into a single queryable index. Multiple agents can share the same knowledge base, ensuring consistent answers across your organization 3. Agentic Retrieval Agentic retrieval is an LLM-assisted retrieval pipeline that: Decomposes complex questions into subqueries Executes searches in parallel across sources Applies semantic reranking Returns a unified response with citations Agent → MCP Tool Call → Knowledge Base → Grounded Response with Citations The retrievalReasoningEffort parameter controls LLM processing: minimal — Fast queries low — Balanced reasoning medium — Complex multi-part questions Project Architecture ┌─────────────────────────────────────────────────────────────────────┐ │ FOUNDRY AGENT SERVICE │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ Agent │───▶│ MCP Tool │───▶│ Project Connection │ │ │ │ (gpt-4.1) │ │ (knowledge_ │ │ (RemoteTool + MI Auth) │ │ │ └─────────────┘ │ base_retrieve) └─────────────────────────┘ │ └─────────────────────────────│───────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ FOUNDRY IQ (Azure AI Search) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ MCP Endpoint: │ │ │ │ /knowledgebases/{kb-name}/mcp?api-version=2025-11-01-preview│ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ │ │ Knowledge │ │ Knowledge │ │ Indexed Documents │ │ │ │ Sources │──│ Base │──│ (auto-chunked, │ │ │ │ (Blob, SP, etc) │ │ (unified index) │ │ auto-embedded) │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ Prerequisites Enable RBAC on Azure AI Search az search service update --name your-search --resource-group your-rg \ --auth-options aadOrApiKey Assign Role to Project's Managed Identity az role assignment create --assignee $PROJECT_MI \ --role "Search Index Data Reader" \ --scope "/subscriptions/.../Microsoft.Search/searchServices/{search}" Install Dependencies pip install azure-ai-projects>=2.0.0b4 azure-identity python-dotenv requests Connecting a Knowledge Base to an Agent The integration requires three steps. Connect Knowledge Base to Agent via MCP The integration requires three steps: Create a project connection — Links your AI Foundry project to the knowledge base using ProjectManagedIdentity authentication Create an agent with MCPTool — The agent uses knowledge_base_retrieve to query the knowledge base Chat with the agent — Use the OpenAI client to have grounded conversations Step 1: Create Project Connection import requests from azure.identity import DefaultAzureCredential, get_bearer_token_provider credential = DefaultAzureCredential() PROJECT_RESOURCE_ID = "/subscriptions/.../providers/Microsoft.CognitiveServices/accounts/.../projects/..." MCP_ENDPOINT = "https://{search}.search.windows.net/knowledgebases/{kb}/mcp?api-version=2025-11-01-preview" def create_project_connection(): """Create MCP connection to knowledge base.""" bearer = get_bearer_token_provider(credential, "https://management.azure.com/.default") response = requests.put( f"https://management.azure.com{PROJECT_RESOURCE_ID}/connections/kb-connection?api-version=2025-10-01-preview", headers={"Authorization": f"Bearer {bearer()}"}, json={ "name": "kb-connection", "properties": { "authType": "ProjectManagedIdentity", "category": "RemoteTool", "target": MCP_ENDPOINT, "isSharedToAll": True, "audience": "https://search.azure.com/", "metadata": {"ApiType": "Azure"} } } ) response.raise_for_status() Step 2: Create Agent with MCP Tool from azure.ai.projects import AIProjectClient from azure.ai.projects.models import PromptAgentDefinition, MCPTool def create_agent(): client = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=credential) # MCP tool connects agent to knowledge base mcp_kb_tool = MCPTool( server_label="knowledge-base", server_url=MCP_ENDPOINT, require_approval="never", allowed_tools=["knowledge_base_retrieve"], project_connection_id="kb-connection" ) # Create agent with knowledge base tool agent = client.agents.create_version( agent_name="enterprise-assistant", definition=PromptAgentDefinition( model="gpt-4.1", instructions="""You MUST use the knowledge_base_retrieve tool for every question. Include citations from sources.""", tools=[mcp_kb_tool] ) ) return agent, client Step 3: Chat with the Agent def chat(agent, client): openai_client = client.get_openai_client() conversation = openai_client.conversations.create() while True: question = input("You: ").strip() if question.lower() == "quit": break response = openai_client.responses.create( conversation=conversation.id, input=question, extra_body={ "agent_reference": { "name": agent.name, "type": "agent_reference" } } ) print(f"Assistant: {response.output_text}") More Information Azure AI Search Knowledge Stores Foundry Agent Service Model Context Protocol (MCP) Azure AI Projects SDK Summary The integration of Foundry IQ with Foundry Agent Service enables developers to build knowledge-grounded AI agents for enterprise scenarios. By combining: MCP-based tool calling Permission-aware retrieval Automatic document processing Semantic reranking organizations can build secure, enterprise-ready AI agents that deliver accurate, traceable responses backed by source data.Title: Synthetic Dataset Format from AI Foundry Not Compatible with Evaluation Schema
Current Situation The synthetic dataset created from AI Foundry Data Synthetic Data is generated in the following messages format { "messages": [ { "role": "system", "content": "You are a helpful assistant" }, { "role": "user", "content": "What is the primary purpose?" }, { "role": "assistant", "content": "The primary purpose is..." } ] } Challenge When attempting evaluation, especially RAG evaluation, the documentation indicates that the dataset must contain structured fields such as question - The query being asked ground_truth - The expected answer Recommended additional fields reference_context metadata Example required format { "question": "", "ground_truth": "", "reference_context": "", "metadata": { "document": "" } } Because the synthetic dataset is in messages format, I am unable to directly map it to the required evaluation schema. Question Is there a recommended or supported way to convert the synthetic dataset generated in AI Foundry messages format into the structured format required for evaluation? Can the user role be mapped to question? Can the assistant role be mapped to ground_truth? Is there any built in transformation option within AI Foundry?102Views1like2CommentsMicrosoft Foundry Labs: A Practical Fast Lane from Research to Real Developer Work
Why developers need a fast lane from research → prototypes AI engineering has a speed problem, but it is not a shortage of announcements. The hard part is turning research into a useful prototype before the next wave of models, tools, or agent patterns shows up. That gap matters. AI engineers want to compare quality, latency, and cost before they wire a model into a product. Full-stack teams want to test whether an agent workflow is real or just demo. Platform and operations teams want to know when an experiment can graduate into something observable and supportable. Microsoft makes that case directly in introducing Microsoft Foundry Labs: breakthroughs are arriving faster, and time from research to product has compressed from years to months. If you build real systems, the question is not "What is the coolest demo?" It is "Which experiments are worth my next hour, and how do I evaluate them without creating demo-ware?" That is where Microsoft Foundry Labs becomes interesting. What is Microsoft Foundry Labs? Microsoft Foundry Labs is a place to explore early-stage experiments and prototypes from Microsoft, with an explicit focus on research-driven innovation. The homepage describes it as a way to get a glimpse of potential future directions for AI through experimental technologies from Microsoft Research and more. The announcement adds the operating idea: Labs is a single access point for developers to experiment with new models from Microsoft, explore frameworks, and share feedback. That framing matters. Labs is not just a gallery of flashy ideas. It is a developer-facing exploration surface for projects that are still close to research: models, agent systems, UX ideas, and tool experiments. Here's some things you can do on Labs: Play with tomorrow’s AI, today: 30+ experimental projects—from models to agents—are openly available to fork and build upon, alongside direct access to breakthrough research from Microsoft. Go from prototype to production, fast: Seamless integration with Microsoft Foundry gives you access to 11,000+ models with built-in compute, safety, observability, and governance—so you can move from local experimentation to full-scale production without complex containerization or switching platforms. Build with the people shaping the future of AI: Join a thriving community of 25,000+ developers across Discord and GitHub with direct access to Microsoft researchers and engineers to share feedback and help shape the most promising technologies. What Labs is not: it is not a promise that every project has a production deployment path today, a long-term support commitment, or a hardened enterprise operating model. Spotlight: a few Labs experiments worth a developer's attention Phi-4-Reasoning-Vision-15B: A compact open-weight multimodal reasoning model that is interesting if you care about the quality-versus-efficiency tradeoff in smaller reasoning systems. BitNet: A native 1-bit large language model that is compelling for engineers who care about memory, compute, and energy efficiency. Fara-7B: An ultra-compact agentic small language model designed for computer use, which makes it relevant for builders exploring UI automation and on-device agents. OmniParser V2: A screen parsing module that turns interfaces into actionable elements, directly relevant to computer-use and UI-interaction agents. If you want to inspect actual code, the Labs project pages also expose official repository links for some of these experiments, including OmniParser, Magentic-UI, and BitNet. Labs vs. Foundry: how to think about the boundary The simplest mental model is this: Labs is the exploration edge; Foundry is the platform layer. The Microsoft Foundry documentation describes the broader platform as "the AI app and agent factory" to build, optimize, and govern AI apps and agents at scale. That is a different promise from Labs. Foundry is where you move from curiosity to implementation: model access, agent services, SDKs, observability, evaluation, monitoring, and governance. Labs helps you explore what might matter next. Foundry helps you build, optimize, and govern what matters now. Labs is where you test a research-shaped idea. Foundry is where you decide whether that idea can survive integration, evaluation, tracing, cost controls, and production scrutiny. That also means Labs is not a replacement for the broader Foundry workflow. If an experiment catches your attention, the next question is not "Can I ship this tomorrow?" It is "What is the integration path, and how will I measure whether it deserves promotion?" What's real today vs. what's experimental Real today: Labs is live as an official exploration hub, and Foundry is the broader platform for building, evaluating, monitoring, and governing AI apps and agents. Experimental by design: Labs projects are presented as experiments and prototypes, so they still need validation for your use case. A developer's lens: Models, Agents, Observability What makes Labs useful is not that it shows new things. It is that it gives developers a way to inspect those things through the same three concerns that matter in every serious AI system: model choice, agent design, and observability. Diagram description: imagine a loop with three boxes in a row: Models, Agents, and Observability. A forward arrow runs across the row, and a feedback arrow loops from Observability back to Models. The point is that evaluation data should change both model choices and agent design, instead of arriving too late. Models: what to look for in Labs experiments If you are model-curious, Labs should trigger an evaluation mindset, not a fandom mindset. When you see something like Phi-4-Reasoning-Vision-15B or BitNet on the Labs homepage, ask three things: what capability is being demonstrated, what constraints are obvious, and what the integration path would look like. This is where the Microsoft Foundry Playgrounds mindset is useful even if you started in Labs. The documentation emphasizes model comparison, prompt iteration, parameter tuning, tools, safety guardrails, and code export. It also pushes the right pre-production questions: price-to-performance, latency, tool integration, and code readiness. That is how I would use Labs for models: not to choose winners, but to generate hypotheses worth testing. If a Labs experiment looks promising, move quickly into a small evaluation matrix around capability, latency, cost, and integration friction. Agents: what Labs unlocks for agent builders Labs is especially interesting for agent builders because many of the projects point toward orchestration and tool-use patterns that matter in practice. The official announcement highlights projects across models and agentic frameworks, including Magentic-One and OmniParser v2. On the homepage, projects such as Fara-7B, OmniParser V2, TypeAgent, and Magentic-UI point in a similar direction: agents get more useful when they can reason over tools, interfaces, plans, and human feedback loops. For working developers, that means Labs can act as a scouting surface for agent patterns rather than just agent demos. Look for UI or computer-use style agents when your system needs to act through an interface rather than an API. Look for planning or tool-selection patterns when orchestration matters more than raw model quality. My suggestion: when a Labs project looks relevant to agent work, do not ask "Can I copy this architecture?" Ask "Which agent pattern is being explored here, and under what constraints would it be useful in my system?" Observability: how to experiment responsibly and measure what matters Observability is where prototypes usually go to die, because teams postpone it until after they have something flashy. That is backwards. If you care about real systems, tracing, evaluation, monitoring, and governance should start during prototyping. The Microsoft Foundry documentation already puts that operating model in plain view through guidance for tracing applications, evaluating agentic workflows, and monitoring generative AI apps. The Microsoft Foundry Playgrounds page is also explicit that the agents playground supports tracing and evaluation through AgentOps. At the governance layer, the AI gateway in Azure API Management documentation reinforces why this matters beyond demos. It covers monitoring and logging AI interactions, tracking token metrics, logging prompts and completions, managing quotas, applying safety policies, and governing models, agents, and tools. You do not need every one of those controls on day one, but you do need the habit: if a prototype cannot tell you what it did, why it failed, and what it cost, it is not ready to influence a roadmap. "Pick one and try it": a 20-minute hands-on path Keep this lightweight and tool-agnostic. The point is not to memorize a product UI. The point is to run a disciplined experiment. Browse Labs and pick an experiment aligned to your work. Start at Microsoft Foundry Labs and choose one project that is adjacent to a real problem you have: model efficiency, multimodal reasoning, UI agents, debugging workflows, or human-in-the-loop design. Read the project page and jump to the repo or paper if available. Use the Labs entry to understand the claim being made. Then read the supporting material, not just the summary sentence. Define one small test task and explicit success criteria. Keep it concrete: latency budget, accuracy target, cost ceiling, acceptable safety behavior, or failure rate under a narrow scenario. Capture telemetry from the start. At minimum, keep prompts or inputs, outputs, intermediate decisions, and failures. If the experiment involves tools or agents, include tool choices and obvious reasons for failure or recovery. Make a hard call. Decide whether to keep exploring or wait for a stronger production-grade path. "Interesting" is not the same as "ready for integration." Minimal experiment logger (my suggestion): if you want a lightweight way to avoid demo-ware, even a local JSONL log is enough to capture prompts, outputs, decisions, failures, and latency while you compare ideas from Labs. import json import time from pathlib import Path LOG_PATH = Path("experiment-log.jsonl") def record_event(name, payload): # Append one event per line so runs are easy to diff and analyze later. with LOG_PATH.open("a", encoding="utf-8") as handle: handle.write(json.dumps({"event": name, **payload}) + "\n") def run_experiment(user_input): started = time.time() try: # Replace this stub with your real model or agent call. output = user_input.upper() decision = "keep exploring" if len(output) < 80 else "wait" record_event( "experiment_result", { "input": user_input, "output": output, "decision": decision, "latency_ms": round((time.time() - started) * 1000, 2), "failure": None, }, ) except Exception as error: record_event( "experiment_result", { "input": user_input, "output": None, "decision": "failed", "latency_ms": round((time.time() - started) * 1000, 2), "failure": str(error), }, ) raise if __name__ == "__main__": run_experiment("Summarize the constraints of this Labs project.") That script is intentionally boring. That is the point. It gives you a repeatable, runnable starting point for comparing experiments without pretending you already have a full observability stack. Practical tips: how I evaluate Labs experiments before betting a roadmap on them Separate the idea from the implementation path. A strong research direction can still have a weak near-term integration story. Test one workload, not ten. Pick a narrow task that resembles your production reality and see whether the experiment moves the needle. Track cost and latency as first-class metrics. A novel capability that breaks your budget or response-time envelope is still a failed fit. Treat agent demos skeptically unless you can inspect behavior. Tool calls, traces, failure cases, and recovery paths matter more than polished output. Common pitfalls are predictable here. Do not confuse a research win with a deployment path. Labs is for exploration, so you still need to validate integration, safety, and operations. Do not evaluate with vague prompts. Use a narrow task and explicit success criteria, or you will end up comparing vibes instead of outcomes. Do not skip telemetry because the prototype is small. If you cannot inspect failures early, the prototype will teach you very little. Do not ignore known limitations. For example, the Fara-7B project page explicitly notes challenges on more complex tasks, instruction-following mistakes, and hallucinations, which is exactly the kind of constraint you should carry into evaluation. What to explore next Azure AI Foundry Labs matters because it gives developers a practical way to explore research-shaped ideas before they harden into mainstream patterns. The smart move is to use Labs as an input into better platform decisions: explore in Labs, validate with the discipline encouraged by Foundry playgrounds, and then bring the learnings back into the broader Foundry workflow. Takeaway 1: Labs is an exploration surface for early-stage, research-driven experiments and prototypes, not a blanket promise of production readiness. Takeaway 2: The right workflow is Labs for discovery, then Microsoft Foundry for implementation, optimization, evaluation, monitoring, and governance. Takeaway 3: Tracing, evaluations, and telemetry should start during prototyping, because that is how you avoid confusing a compelling demo with a viable system. If you are curious, start with Microsoft Foundry Labs, read the official context in Introducing Microsoft Foundry Labs, and then map what you learn into the platform guidance in Microsoft Foundry documentation. Try this next Open Microsoft Foundry Labs and choose one experiment that matches a real workload you care about. Use the mindset from Microsoft Foundry Playgrounds to define a small validation task around quality, latency, cost, and safety. Write down the minimum telemetry you need before continuing: inputs, outputs, decisions, failures, and token or cost signals. Read the relevant operating guidance in AI gateway in Azure API Management if your experiment may eventually need monitoring, quotas, safety policies, or governance. Promote only the experiments that can explain their value clearly in a Foundry-shaped build, evaluation, and observability workflow.Vectorless Reasoning-Based RAG: A New Approach to Retrieval-Augmented Generation
Introduction Retrieval-Augmented Generation (RAG) has become a widely adopted architecture for building AI applications that combine Large Language Models (LLMs) with external knowledge sources. Traditional RAG pipelines rely heavily on vector embeddings and similarity search to retrieve relevant documents. While this works well for many scenarios, it introduces challenges such as: Requires chunking documents into small segments Important context can be split across chunks Embedding generation and vector databases add infrastructure complexity A new paradigm called Vectorless Reasoning-Based RAG is emerging to address these challenges. One framework enabling this approach is PageIndex, an open-source document indexing system that organizes documents into a hierarchical tree structure and allows Large Language Models (LLMs) to perform reasoning-based retrieval over that structure. Vectorless Reasoning-Based RAG Instead of vectors, this approach uses structured document navigation. User Query ->Document Tree Structure ->LLM Reasoning ->Relevant Nodes Retrieved ->LLM Generates Answer This mimics how humans read documents: Look at the table of contents Identify relevant sections Read the relevant content Answer the question Core features No Vector Database: It relies on document structure and LLM reasoning for retrieval. It does not depend on vector similarity search. No Chunking: Documents are not split into artificial chunks. Instead, they are organized using their natural structure, such as pages and sections. Human-like Retrieval: The system mimics how human experts read documents. It navigates through sections and extracts information from relevant parts. Better Explainability and Traceability: Retrieval is based on reasoning. The results can be traced back to specific pages and sections. This makes the process easier to interpret. It avoids opaque and approximate vector search, often called “vibe retrieval.” When to Use Vectorless RAG Vectorless RAG works best when: Data is structured or semi-structured Documents have clear metadata Knowledge sources are well organized Queries require reasoning rather than semantic similarity Examples: enterprise knowledge bases internal documentation systems compliance and policy search healthcare documentation financial reporting Implementing Vectorless RAG with Azure AI Foundry Step 1 : Install Pageindex using pip command, from pageindex import PageIndexClient import pageindex.utils as utils # Get your PageIndex API key from https://dash.pageindex.ai/api-keys PAGEINDEX_API_KEY = "YOUR_PAGEINDEX_API_KEY" pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY) Step 2 : Set up your LLM Example using Azure OpenAI: from openai import AsyncAzureOpenAI client = AsyncAzureOpenAI( api_key=AZURE_OPENAI_API_KEY, azure_endpoint=AZURE_OPENAI_ENDPOINT, api_version=AZURE_OPENAI_API_VERSION ) async def call_llm(prompt, temperature=0): response = await client.chat.completions.create( model=AZURE_DEPLOYMENT_NAME, messages=[{"role": "user", "content": prompt}], temperature=temperature ) return response.choices[0].message.content.strip() Step 3: Page Tree Generation import os, requests pdf_url = "https://arxiv.org/pdf/2501.12948.pdf" //give the pdf url for tree generation, here given one for example pdf_path = os.path.join("../data", pdf_url.split('/')[-1]) os.makedirs(os.path.dirname(pdf_path), exist_ok=True) response = requests.get(pdf_url) with open(pdf_path, "wb") as f: f.write(response.content) print(f"Downloaded {pdf_url}") doc_id = pi_client.submit_document(pdf_path)["doc_id"] print('Document Submitted:', doc_id) Step 4 : Print the generated pageindex tree structure if pi_client.is_retrieval_ready(doc_id): tree = pi_client.get_tree(doc_id, node_summary=True)['result'] print('Simplified Tree Structure of the Document:') utils.print_tree(tree) else: print("Processing document, please try again later...") Step 5 : Use LLM for tree search and identify nodes that might contain relevant context import json query = "What are the conclusions in this document?" tree_without_text = utils.remove_fields(tree.copy(), fields=['text']) search_prompt = f""" You are given a question and a tree structure of a document. Each node contains a node id, node title, and a corresponding summary. Your task is to find all nodes that are likely to contain the answer to the question. Question: {query} Document tree structure: {json.dumps(tree_without_text, indent=2)} Please reply in the following JSON format: {{ "thinking": "<Your thinking process on which nodes are relevant to the question>", "node_list": ["node_id_1", "node_id_2", ..., "node_id_n"] }} Directly return the final JSON structure. Do not output anything else. """ tree_search_result = await call_llm(search_prompt) Step 6 : Print retrieved nodes and reasoning process node_map = utils.create_node_mapping(tree) tree_search_result_json = json.loads(tree_search_result) print('Reasoning Process:') utils.print_wrapped(tree_search_result_json['thinking']) print('\nRetrieved Nodes:') for node_id in tree_search_result_json["node_list"]: node = node_map[node_id] print(f"Node ID: {node['node_id']}\t Page: {node['page_index']}\t Title: {node['title']}") Step 7: Answer generation node_list = json.loads(tree_search_result)["node_list"] relevant_content = "\n\n".join(node_map[node_id]["text"] for node_id in node_list) print('Retrieved Context:\n') utils.print_wrapped(relevant_content[:1000] + '...') answer_prompt = f""" Answer the question based on the context: Question: {query} Context: {relevant_content} Provide a clear, concise answer based only on the context provided. """ print('Generated Answer:\n') answer = await call_llm(answer_prompt) utils.print_wrapped(answer) When to Use Each Approach Both vector-based RAG and vectorless RAG have their strengths. Choosing the right approach depends on the nature of the documents and the type of retrieval required. When to Use Vector Database–Based RAG Vector-based retrieval works best when dealing with large collections of unrelated or loosely structured documents. In such cases, semantic similarity is often sufficient to identify relevant information quickly. Use vector RAG when: Searching across many independent documents Semantic similarity is sufficient to locate relevant content Real-time retrieval is required over very large datasets Common use cases include: Customer support knowledge bases Conversational chatbots Product and content search systems When to Use Vectorless RAG Vectorless approaches such as PageIndex are better suited for long, structured documents where understanding the logical organization of the content is important. Use vectorless RAG when: Documents contain clear hierarchical structure Logical reasoning across sections is required High retrieval accuracy is critical Typical examples include: Financial filings and regulatory reports Legal documents and contracts Technical manuals and documentation Academic and research papers In these scenarios, navigating the document structure allows the system to identify the exact section that logically contains the answer, rather than relying only on semantic similarity. Conclusion Vector databases significantly advanced RAG architectures by enabling scalable semantic search across large datasets. However, they are not the optimal solution for every type of document. Vectorless approaches such as PageIndex introduce a different philosophy: instead of retrieving text that is merely semantically similar, they retrieve text that is logically relevant by reasoning over the structure of the document. As RAG architectures continue to evolve, the future will likely combine the strengths of both approaches. Hybrid systems that integrate vector search for broad retrieval and reasoning-based navigation for precision may offer the best balance of scalability and accuracy for enterprise AI applications.1.2KViews2likes0CommentsHosted Containers and AI Agent Solutions
If you have built a proof-of-concept AI agent on your laptop and wondered how to turn it into something other people can actually use, you are not alone. The gap between a working prototype and a production-ready service is where most agent projects stall. Hosted containers close that gap faster than any other approach available today. This post walks through why containers and managed hosting platforms like Azure Container Apps are an ideal fit for multi-agent AI systems, what practical benefits they unlock, and how you can get started with minimal friction. The problem with "it works on my machine" Most AI agent projects begin the same way: a Python script, an API key, and a local terminal. That workflow is perfect for experimentation, but it creates a handful of problems the moment you try to share your work. First, your colleagues need the same Python version, the same dependencies, and the same environment variables. Second, long-running agent pipelines tie up your machine and compete with everything else you are doing. Third, there is no reliable URL anyone can visit to use the system, which means every demo involves a screen share or a recorded video. Containers solve all three problems in one step. A single Dockerfile captures the runtime, the dependencies, and the startup command. Once the image builds, it runs identically on any machine, any cloud, or any colleague's laptop. Why containers suit AI agents particularly well AI agents have characteristics that make them a better fit for containers than many traditional web applications. Long, unpredictable execution times A typical web request completes in milliseconds. An agent pipeline that retrieves context from a database, imports a codebase, runs four verification agents in sequence, and generates a report can take two to five minutes. Managed container platforms handle long-running requests gracefully, with configurable timeouts and automatic keep-alive, whereas many serverless platforms impose strict execution limits that agent workloads quickly exceed. Heavy, specialised dependencies Agent applications often depend on large packages: machine learning libraries, language model SDKs, database drivers, and Git tooling. A container image bundles all of these once at build time. There is no cold-start dependency resolution and no version conflict with other projects on the same server. Stateless by design Most agent pipelines are stateless. They receive a request, execute a sequence of steps, and return a result. This maps perfectly to the container model, where each instance handles requests independently and the platform can scale the number of instances up or down based on demand. Reproducible environments When an agent misbehaves in production, you need to reproduce the issue locally. With containers, the production environment and the local environment are the same image. There is no "works on my machine" ambiguity. A real example: multi-agent code verification To make this concrete, consider a system called Opustest, an open-source project that uses the Microsoft Agent Framework with Azure OpenAI to analyse Python codebases automatically. The system runs AI agents in a pipeline: A Code Example Retrieval Agent queries Azure Cosmos DB for curated examples of good and bad Python code, providing the quality standards for the review. A Codebase Import Agent reads all Python files from a Git repository cloned on the server. Four Verification Agents each score a different dimension of code quality (coding standards, functional correctness, known error handling, and unknown error handling) on a scale of 0 to 5. A Report Generation Agent compiles all scores and errors into an HTML report with fix prompts that can be exported and fed directly into a coding assistant. The entire pipeline is orchestrated by a FastAPI backend that streams progress updates to the browser via Server-Sent Events. Users paste a Git URL, watch each stage light up in real time, and receive a detailed report at the end. The app in action Landing page: the default Git URL mode, ready for a repository link. Local Path mode: toggling to analyse a codebase from a local directory. Repository URL entered: a GitHub repository ready for verification. Stage 1: the Code Example Retrieval Agent fetching standards from Cosmos DB. Stage 3: the four Verification Agents scoring the codebase. Stage 4: the Report Generation Agent compiling the final report. Verification complete: all stages finished with a success banner. Report detail: scores and the errors table with fix prompts. The Dockerfile The container definition for this system is remarkably simple: FROM python:3.12-slim RUN apt-get update && apt-get install -y --no-install-recommends git \ && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY backend/ backend/ COPY frontend/ frontend/ RUN adduser --disabled-password --gecos "" appuser USER appuser EXPOSE 8000 CMD ["uvicorn", "backend.app:app", "--host", "0.0.0.0", "--port", "8000"] Twenty lines. That is all it takes to package a six-agent AI system with a web frontend, a FastAPI backend, Git support, and all Python dependencies into a portable, production-ready image. Notice the security detail: the container runs as a non-root user. This is a best practice that many tutorials skip, but it matters when you are deploying to a shared platform. From image to production in one command With the Azure Developer CLI ( azd ), deploying this container to Azure Container Apps takes a single command: azd up Behind the scenes, azd reads an azure.yaml file that declares the project structure, provisions the infrastructure defined in Bicep templates (a Container Apps environment, an Azure Container Registry, and a Cosmos DB account), builds the Docker image, pushes it to the registry, deploys it to the container app, and even seeds the database with sample data via a post-provision hook. The result is a publicly accessible URL serving the full agent system, with automatic HTTPS, built-in scaling, and zero infrastructure to manage manually. Microsoft Hosted Agents vs Azure Container Apps: choosing the right home Microsoft offers two distinct approaches for running AI agent workloads in the cloud. Understanding the difference is important when deciding how to host your solution. Microsoft Foundry Hosted Agent Service (Microsoft Foundry) Microsoft Foundry provides a fully managed agent hosting service. You define your agent's behaviour declaratively, upload it to the platform, and Foundry handles execution, scaling, and lifecycle management. This is an excellent choice when your agents fit within the platform's conventions: single-purpose agents that respond to prompts, use built-in tool integrations, and do not require custom server-side logic or a bespoke frontend. Key characteristics of hosted agents in Foundry: Fully managed execution. You do not provision or maintain any infrastructure. The platform runs your agent and handles scaling automatically. Declarative configuration. Agents are defined through configuration and prompt templates rather than custom application code. Built-in tool ecosystem. Foundry provides pre-built connections to Azure services, knowledge stores, and evaluation tooling. Opinionated runtime. The platform controls the execution environment, request handling, and networking. Azure Container Apps Azure Container Apps is a managed container hosting platform. You package your entire application (agents, backend, frontend, and all dependencies) into a Docker image and deploy it. The platform handles scaling, HTTPS, and infrastructure, but you retain full control over what runs inside the container. Key characteristics of Container Apps: Full application control. You own the runtime, the web framework, the agent orchestration logic, and the frontend. Custom networking. You can serve a web UI, expose REST APIs, stream Server-Sent Events, or run WebSocket connections. Arbitrary dependencies. Your container can include any system package, any Python library, and any tooling (like Git for cloning repositories). Portable. The same Docker image runs locally, in CI, and in production without modification. Why Opustest uses Container Apps Opustest requires capabilities that go beyond what a managed agent hosting platform provides: Requirement Hosted Agents (Foundry) Container Apps Custom web UI with real-time progress Not supported natively Full control via FastAPI and SSE Multi-agent orchestration pipeline Platform-managed, limited customisation Custom orchestrator with arbitrary logic Git repository cloning on the server Not available Install Git in the container image Server-Sent Events streaming Not supported Full HTTP control Custom HTML report generation Limited to platform outputs Generate and serve any content Export button for Copilot prompts Not available Custom frontend with JavaScript RAG retrieval from Cosmos DB Possible via built-in connectors Direct SDK access with full query control The core reason is straightforward: Opustest is not just a set of agents. It is a complete web application that happens to use agents as its processing engine. It needs a custom frontend, real-time streaming, server-side Git operations, and full control over how the agent pipeline executes. Container Apps provides all of this while still offering managed infrastructure, automatic scaling, and zero server maintenance. When to choose which Choose Microsoft Hosted Agents when your use case is primarily conversational or prompt-driven, when you want the fastest path to a working agent with minimal code, and when the built-in tool ecosystem covers your integration needs. Choose Azure Container Apps when you need a custom frontend, custom orchestration logic, real-time streaming, server-side processing beyond prompt-response patterns, or when your agent system is part of a larger application with its own web server and API surface. Both approaches use the same underlying AI models via Azure OpenAI. The difference is in how much control you need over the surrounding application. Five practical benefits of hosted containers for agents 1. Consistent deployments across environments Whether you are running the container locally with docker run , in a CI pipeline, or on Azure Container Apps, the behaviour is identical. Configuration differences are handled through environment variables, not code changes. This eliminates an entire category of "it works locally but breaks in production" bugs. 2. Scaling without re-architecture Azure Container Apps can scale from zero instances (paying nothing when idle) to multiple instances under load. Because agent pipelines are stateless, each request is routed to whichever instance is available. You do not need to redesign your application to handle concurrency; the platform does it for you. 3. Isolation between services If your agent system grows to include multiple services (perhaps a separate service for document processing or a background worker for batch analysis), each service gets its own container. They can be deployed, scaled, and updated independently. A bug in one service does not bring down the others. 4. Built-in observability Managed container platforms provide logging, metrics, and health checks out of the box. When an agent pipeline fails after three minutes of execution, you can inspect the container logs to see exactly which stage failed and why, without adding custom logging infrastructure. 5. Infrastructure as code The entire deployment can be defined in code. Bicep templates, Terraform configurations, or Pulumi programmes describe every resource. This means deployments are repeatable, reviewable, and version-controlled alongside your application code. No clicking through portals, no undocumented manual steps. Common concerns addressed "Containers add complexity" For a single-file script, this is a fair point. But the moment your agent system has more than one dependency, a Dockerfile is simpler to maintain than a set of installation instructions. It is also self-documenting: anyone reading the Dockerfile knows exactly what the system needs to run. "Serverless is simpler" Serverless functions are excellent for short, event-driven tasks. But agent pipelines that run for minutes, require persistent connections (like SSE streaming), and depend on large packages are a poor fit for most serverless platforms. Containers give you the operational simplicity of managed hosting without the execution constraints. "I do not want to learn Docker" A basic Dockerfile for a Python application is fewer than ten lines. The core concepts are straightforward: start from a base image, install dependencies, copy your code, and specify the startup command. The learning investment is small relative to the deployment problems it solves. "What about cost?" Azure Container Apps supports scale-to-zero, meaning you pay nothing when the application is idle. For development and demonstration purposes, this makes hosted containers extremely cost-effective. You only pay for the compute time your agents actually use. Getting started: a practical checklist If you are ready to containerise your own agent solution, here is a step-by-step approach. Step 1: Write a Dockerfile. Start from an official Python base image. Install system-level dependencies (like Git, if your agents clone repositories), then your Python packages, then your application code. Run as a non-root user. Step 2: Test locally. Build and run the image on your machine: docker build -t my-agent-app . docker run -p 8000:8000 --env-file .env my-agent-app If it works locally, it will work in the cloud. Step 3: Define your infrastructure. Use Bicep, Terraform, or the Azure Developer CLI to declare the resources you need: a container app, a container registry, and any backing services (databases, key vaults, AI endpoints). Step 4: Deploy. Push your image to the registry and deploy to the container platform. With azd , this is a single command. With CI/CD, it is a pipeline that runs on every push to your main branch. Step 5: Iterate. Change your agent code, rebuild the image, and redeploy. The cycle is fast because Docker layer caching means only changed layers are rebuilt. The broader picture The AI agent ecosystem is maturing rapidly. Frameworks like Microsoft Agent Framework, LangChain, Semantic Kernel, and AutoGen make it straightforward to build sophisticated multi-agent systems. But building is only half the challenge. The other half is running these systems reliably, securely, and at scale. Hosted containers offer the best balance of flexibility and operational simplicity for agent workloads. They do not impose the execution limits of serverless platforms. They do not require the operational overhead of managing virtual machines. They give you a portable, reproducible unit of deployment that works the same everywhere. If you have an agent prototype sitting on your laptop, the path to making it available to your team, your organisation, or the world is shorter than you think. Write a Dockerfile, define your infrastructure, run azd up , and share the URL. Your agents deserve a proper home. Hosted containers are that home. Resources Azure Container Apps documentation Microsoft Foundry Hosted Agents Azure Developer CLI (azd) Microsoft Agent Framework Docker getting started guide Opustest: AI-powered code verification (source code)Announcing the IQ Series: Foundry IQ
AI agents are rapidly becoming a new way to build applications. But for agents to be truly useful, they need access to the knowledge and context that helps them reason about the world they operate in. That’s where Foundry IQ comes in. Today we’re announcing the IQ Series: Foundry IQ, a new set of developer-focused episodes exploring how to build knowledge-centric AI systems using Foundry IQ. The series focuses on the core ideas behind how modern AI systems work with knowledge, how they retrieve information, reason across sources, synthesize answers, and orchestrate multi-step interactions. Instead of treating retrieval as a single step in a pipeline, Foundry IQ approaches knowledge as something that AI systems actively work with throughout the reasoning process. The IQ Series breaks down these concepts and shows how they come together when building real AI applications. You can explore the series and all the accompanying samples here: 👉 https://aka.ms/iq-series What is Foundry IQ? Foundry IQ helps AI systems work with knowledge in a more structured and intentional way. Rather than wiring retrieval logic directly into every application, developers can define knowledge bases that connect to documents, data sources, and other information systems. AI agents can then query these knowledge bases to gather the context they need to generate responses, make decisions, or complete tasks. This model allows knowledge to be organized, reused, and combined across applications, instead of being rebuilt for each new scenario. What's covered in the IQ Series? The Foundry IQ episodes in the IQ Series explore the key building blocks behind knowledge-driven AI systems from how knowledge enters the system to how agents ultimately query and use it. The series is released as three weekly episodes: Foundry IQ: Unlocking Knowledge for Your Agents — March 18, 2026: Introduces Foundry IQ and the core ideas behind it. The episode explains how AI agents work with knowledge and walks through the main components of the Foundry IQ that support knowledge-driven applications. Foundry IQ: Building the Data Pipeline with Knowledge Sources — March 25, 2026: Focuses on Knowledge Sources and how different types of content flow into Foundry IQ. It explores how systems such as SharePoint, Fabric, OneLake, Azure Blob Storage, Azure AI Search, and the web contribute information that AI systems can later retrieve and use. Foundry IQ: Querying the Multi-Source AI Knowledge Bases — April 1, 2026: Dives into the Knowledge Bases and how multiple knowledge sources can be organized behind a single endpoint. The episode demonstrates how AI systems query across these sources and synthesize information to answer complex questions. Each episode includes a short executive introduction, a tech talk exploring the topic in depth, and a visual recap with doodle summaries of the key ideas. Alongside the episodes, the GitHub repository provides cookbooks with sample code, summary of the episodes, and additinal learning resources, so developers can explore the concepts and apply them in their own projects. Explore the Repo All episodes and supporting materials live in the IQ Series repository: 👉 https://aka.ms/iq-series Inside the repository you’ll find: The Foundry IQ episode links Cookbooks for each episode Links to documentation and additional resources If you're building AI agents or exploring how AI systems can work with knowledge, the IQ Series is a great place to start. Watch the episodes and explore the cookbooks! We’re excited to see what you build and welcome your feedback & ideas as the series evolves.Building real-world AI automation with Foundry Local and the Microsoft Agent Framework
A hands-on guide to building real-world AI automation with Foundry Local, the Microsoft Agent Framework, and PyBullet. No cloud subscription, no API keys, no internet required. Why Developers Should Care About Offline AI Imagine telling a robot arm to "pick up the cube" and watching it execute the command in a physics simulator, all powered by a language model running on your laptop. No API calls leave your machine. No token costs accumulate. No internet connection is needed. That is what this project delivers, and every piece of it is open source and ready for you to fork, extend, and experiment with. Most AI demos today lean on cloud endpoints. That works for prototypes, but it introduces latency, ongoing costs, and data privacy concerns. For robotics and industrial automation, those trade-offs are unacceptable. You need inference that runs where the hardware is: on the factory floor, in the lab, or on your development machine. Foundry Local gives you an OpenAI-compatible endpoint running entirely on-device. Pair it with a multi-agent orchestration framework and a physics engine, and you have a complete pipeline that translates natural language into validated, safe robot actions. This post walks through how we built it, why the architecture works, and how you can start experimenting with your own offline AI simulators today. Architecture The system uses four specialised agents orchestrated by the Microsoft Agent Framework: Agent What It Does Speed PlannerAgent Sends user command to Foundry Local LLM → JSON action plan 4–45 s SafetyAgent Validates against workspace bounds + schema < 1 ms ExecutorAgent Dispatches actions to PyBullet (IK, gripper) < 2 s NarratorAgent Template summary (LLM opt-in via env var) < 1 ms User (text / voice) │ ▼ ┌──────────────┐ │ Orchestrator │ └──────┬───────┘ │ ┌────┴────┐ ▼ ▼ Planner Narrator │ ▼ Safety │ ▼ Executor │ ▼ PyBullet Setting Up Foundry Local from foundry_local import FoundryLocalManager import openai manager = FoundryLocalManager("qwen2.5-coder-0.5b") client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key, ) resp = client.chat.completions.create( model=manager.get_model_info("qwen2.5-coder-0.5b").id, messages=[{"role": "user", "content": "pick up the cube"}], max_tokens=128, stream=True, ) from foundry_local import FoundryLocalManager import openai manager = FoundryLocalManager("qwen2.5-coder-0.5b") client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key, ) resp = client.chat.completions.create( model=manager.get_model_info("qwen2.5-coder-0.5b").id, messages=[{"role": "user", "content": "pick up the cube"}], max_tokens=128, stream=True, ) The SDK auto-selects the best hardware backend (CUDA GPU → QNN NPU → CPU). No configuration needed. How the LLM Drives the Simulator Understanding the interaction between the language model and the physics simulator is central to the project. The two never communicate directly. Instead, a structured JSON contract forms the bridge between natural language and physical motion. From Words to JSON When a user says “pick up the cube”, the PlannerAgent sends the command to the Foundry Local LLM alongside a compact system prompt. The prompt lists every permitted tool and shows the expected JSON format. The LLM responds with a structured plan: { "type": "plan", "actions": [ {"tool": "describe_scene", "args": {}}, {"tool": "pick", "args": {"object": "cube_1"}} ] } The planner parses this response, validates it against the action schema, and retries once if the JSON is malformed. This constrained output format is what makes small models (0.5B parameters) viable: the response space is narrow enough that even a compact model can produce correct JSON reliably. From JSON to Motion Once the SafetyAgent approves the plan, the ExecutorAgent maps each action to concrete PyBullet calls: move_ee(target_xyz) : The target position in Cartesian coordinates is passed to PyBullet's inverse kinematics solver, which computes the seven joint angles needed to place the end-effector at that position. The robot then interpolates smoothly from its current joint state to the target, stepping the physics simulation at each increment. pick(object) : This triggers a multi-step grasp sequence. The controller looks up the object's position in the scene, moves the end-effector above the object, descends to grasp height, closes the gripper fingers with a configurable force, and lifts. At every step, PyBullet resolves contact forces and friction so that the object behaves realistically. place(target_xyz) : The reverse of a pick. The robot carries the grasped object to the target coordinates and opens the gripper, allowing the physics engine to drop the object naturally. describe_scene() : Rather than moving the robot, this action queries the simulation state and returns the position, orientation, and name of every object on the table, along with the current end-effector pose. The Abstraction Boundary The critical design choice is that the LLM knows nothing about joint angles, inverse kinematics, or physics. It operates purely at the level of high-level tool calls ( pick , move_ee ). The ActionExecutor translates those tool calls into the low-level API that PyBullet provides. This separation means the LLM prompt stays simple, the safety layer can validate plans without understanding kinematics, and the executor can be swapped out without retraining or re-prompting the model. Voice Input Pipeline Voice commands follow three stages: Browser capture: MediaRecorder captures audio, client-side resamples to 16 kHz mono WAV Server transcription: Foundry Local Whisper (ONNX, cached after first load) with automatic 30 s chunking Command execution: transcribed text goes through the same Planner → Safety → Executor pipeline The mic button (🎤) only appears when a Whisper model is cached or loaded. Whisper models are filtered out of the LLM dropdown. Web UI in Action Pick command Describe command Move command Reset command Performance: Model Choice Matters Model Params Inference Pipeline Total qwen2.5-coder-0.5b 0.5 B ~4 s ~5 s phi-4-mini 3.6 B ~35 s ~36 s qwen2.5-coder-7b 7 B ~45 s ~46 s For interactive robot control, qwen2.5-coder-0.5b is the clear winner: valid JSON for a 7-tool schema in under 5 seconds. The Simulator in Action Here is the Panda robot arm performing a pick-and-place sequence in PyBullet. Each frame is rendered by the simulator's built-in camera and streamed to the web UI in real time. Overview Reaching Above the cube Gripper detail Front interaction Side layout Get Running in Five Minutes You do not need a GPU, a cloud account, or any prior robotics experience. The entire stack runs on a standard development machine. # 1. Install Foundry Local winget install Microsoft.FoundryLocal # Windows brew install foundrylocal # macOS # 2. Download models (one-time, cached locally) foundry model run qwen2.5-coder-0.5b # Chat brain (~4 s inference) foundry model run whisper-base # Voice input (194 MB) # 3. Clone and set up the project git clone https://github.com/leestott/robot-simulator-foundrylocal cd robot-simulator-foundrylocal .\setup.ps1 # or ./setup.sh on macOS/Linux # 4. Launch the web UI python -m src.app --web --no-gui # → http://localhost:8080 Once the server starts, open your browser and try these commands in the chat box: "pick up the cube": the robot grasps the blue cube and lifts it "describe the scene": returns every object's name and position "move to 0.3 0.2 0.5": sends the end-effector to specific coordinates "reset": returns the arm to its neutral pose If you have a microphone connected, hold the mic button and speak your command instead of typing. Voice input uses a local Whisper model, so your audio never leaves the machine. Experiment and Build Your Own The project is deliberately simple so that you can modify it quickly. Here are some ideas to get started. Add a new robot action The robot currently understands seven tools. Adding an eighth takes four steps: Define the schema in TOOL_SCHEMAS ( src/brain/action_schema.py ). Write a _do_<tool> handler in src/executor/action_executor.py . Register it in ActionExecutor._dispatch . Add a test in tests/test_executor.py . For example, you could add a rotate_ee tool that spins the end-effector to a given roll/pitch/yaw without changing position. Add a new agent Every agent follows the same pattern: an async run(context) method that reads from and writes to a shared dictionary. Create a new file in src/agents/ , register it in orchestrator.py , and the pipeline will call it in sequence. Ideas for new agents: VisionAgent: analyse a camera frame to detect objects and update the scene state before planning. CostEstimatorAgent: predict how many simulation steps an action plan will take and warn the user if it is expensive. ExplanationAgent: generate a step-by-step natural language walkthrough of the plan before execution, allowing the user to approve or reject it. Swap the LLM python -m src.app --web --model phi-4-mini Or use the model dropdown in the web UI; no restart is needed. Try different models and compare accuracy against inference speed. Smaller models are faster but may produce malformed JSON more often. Larger models are more accurate but slower. The retry logic in the planner compensates for occasional failures, so even a small model works well in practice. Swap the simulator PyBullet is one option, but the architecture does not depend on it. You could replace the simulation layer with: MuJoCo: a high-fidelity physics engine popular in reinforcement learning research. Isaac Sim: NVIDIA's GPU-accelerated robotics simulator with photorealistic rendering. Gazebo: the standard ROS simulator, useful if you plan to move to real hardware through ROS 2. The only requirement is that your replacement implements the same interface as PandaRobot and GraspController . Build something completely different The pattern at the heart of this project (LLM produces structured JSON, safety layer validates, executor dispatches to a domain-specific engine) is not limited to robotics. You could apply the same architecture to: Home automation: "turn off the kitchen lights and set the thermostat to 19 degrees" translated into MQTT or Zigbee commands. Game AI: natural language control of characters in a game engine, with the safety agent preventing invalid moves. CAD automation: voice-driven 3D modelling where the LLM generates geometry commands for OpenSCAD or FreeCAD. Lab instrumentation: controlling scientific equipment (pumps, stages, spectrometers) via natural language, with the safety agent enforcing hardware limits. From Simulator to Real Robot One of the most common questions about projects like this is whether it could control a real robot. The answer is yes, and the architecture is designed to make that transition straightforward. What Stays the Same The entire upper half of the pipeline is hardware-agnostic: The LLM planner generates the same JSON action plans regardless of whether the target is simulated or physical. It has no knowledge of the underlying hardware. The safety agent validates workspace bounds and tool schemas. For a real robot, you would tighten the bounds to match the physical workspace and add checks for obstacle clearance using sensor data. The orchestrator coordinates agents in the same sequence. No changes are needed. The narrator reports what happened. It works with any result data the executor returns. What Changes The only component that must be replaced is the executor layer, specifically the PandaRobot class and the GraspController . In simulation, these call PyBullet's inverse kinematics solver and step the physics engine. On a real robot, they would instead call the hardware driver. For a Franka Emika Panda (the same robot modelled in the simulation), the replacement options include: libfranka: Franka's C++ real-time control library, which accepts joint position or torque commands at 1 kHz. ROS 2 with MoveIt: A robotics middleware stack that provides motion planning, collision avoidance, and hardware abstraction. The move_ee action would become a MoveIt goal, and the framework would handle trajectory planning and execution. Franka ROS 2 driver: Combines libfranka with ROS 2 for a drop-in replacement of the simulation controller. The ActionExecutor._dispatch method maps tool names to handler functions. Replacing _do_move_ee , _do_pick , and _do_place with calls to a real robot driver is the only code change required. Key Considerations for Real Hardware Safety: A simulated robot cannot cause physical harm; a real robot can. The safety agent would need to incorporate real-time collision checking against sensor data (point clouds from depth cameras, for example) rather than relying solely on static workspace bounds. Perception: In simulation, object positions are known exactly. On a real robot, you would need a perception system (cameras with object detection or fiducial markers) to locate objects before grasping. Calibration: The simulated robot's coordinate frame matches the URDF model perfectly. A real robot requires hand-eye calibration to align camera coordinates with the robot's base frame. Latency: Real actuators have physical response times. The executor would need to wait for motion completion signals from the hardware rather than stepping a simulation loop. Gripper feedback: In PyBullet, grasp success is determined by contact forces. A real gripper would provide force or torque feedback to confirm whether an object has been securely grasped. The Simulation as a Development Tool This is precisely why simulation-first development is valuable. You can iterate on the LLM prompts, agent logic, and command pipeline without risk to hardware. Once the pipeline reliably produces correct action plans in simulation, moving to a real robot is a matter of swapping the lowest layer of the stack. Key Takeaways for Developers On-device AI is production-ready. Foundry Local serves models through a standard OpenAI-compatible API. If your code already uses the OpenAI SDK, switching to local inference is a one-line change to base_url . Small models are surprisingly capable. A 0.5B parameter model produces valid JSON action plans in under 5 seconds. For constrained output schemas, you do not need a 70B model. Multi-agent pipelines are more reliable than monolithic prompts. Splitting planning, validation, execution, and narration across four agents makes each one simpler to test, debug, and replace. Simulation is the safest way to iterate. You can refine LLM prompts, agent logic, and tool schemas without risking real hardware. When the pipeline is reliable, swapping the executor for a real robot driver is the only change needed. The pattern generalises beyond robotics. Structured JSON output from an LLM, validated by a safety layer, dispatched to a domain-specific engine: that pattern works for home automation, game AI, CAD, lab equipment, and any other domain where you need safe, structured control. You can start building today. The entire project runs on a standard laptop with no GPU, no cloud account, and no API keys. Clone the repository, run the setup script, and you will have a working voice-controlled robot simulator in under five minutes. Ready to start building? Clone the repository, try the commands, and then start experimenting. Fork it, add your own agents, swap in a different simulator, or apply the pattern to an entirely different domain. The best way to learn how local AI can solve real-world problems is to build something yourself. Source code: github.com/leestott/robot-simulator-foundrylocal Built with Foundry Local, Microsoft Agent Framework, PyBullet, and FastAPI.🚀 AI Toolkit for VS Code — March 2026 Update
March brings another milestone for AI Toolkit! Version 0.32.0 is packed with new capabilities designed to help you ship production ready AI agents. This release brings a unified tree view experience, Agent Builder enhancements, and streamlined GitHub Copilot integration for agent development. 🗂️ Streamlined User Experience across AI Toolkit and Foundry Extension We've heard your feedback loud and clear: navigating between the AI Toolkit and Microsoft Foundry extensions could sometimes feel confusing. To simplify your workflow, we are consolidating the user experience. We've merged the Foundry sidebar directly into AI Toolkit, allowing you to access the power of both extensions. Unified My Resources View: The AI Toolkit and Foundry extension sidebar panels have been unified into a single My Resources view. Local resources (models, agents, tools) are grouped under a Local Resources node, with Foundry remote resources appearing right alongside them. All nodes are collapsed by default and preserve their state across sessions. Developer Tools View Mode: The Developer Tools panel now supports two switchable layouts, accessible from the panel title menu: Group by Lifecycle (organizes entries into Discover, Build, and Monitor stages) or Group by Resource (groups them into Agent Dev Tools and Model Tools). Foundry Visual Cues: Foundry agents and models now have dedicated icons to distinguish them from local resources at a glance. Additionally, Foundry Model Licenses are now displayed directly in the model catalog. Inline Setup: If Foundry is installed but not yet configured, a setup prompt is shown inline instead of an empty panel. 🛠️ Unified Create Agent Experience We've introduced a Create Agent View, serving as a new unified entry point for creating AI agents. It offers two distinct paths side by side: Create in Code with Full Control: Scaffold a project from a template, or generate a single agent or multi-agent workflow using GitHub Copilot. Design an Agent Without Code: Launch Agent Builder directly to configure a prompt agent through the UI. 🤖 GitHub Copilot Agent Development: Build with Skills Agent code generation, evaluation, and deployment now uses the open-source Microsoft foundry skill from —the same source used by GitHub Copilot for Azure. AI Toolkit automatically installs and keeps this skill up to date, requiring no manual setup. It is more important than ever to build with skills. We've optimized the toolkit so you can seamlessly use skills to help you build agents directly within your workflow, taking full advantage of the deep Copilot integration. 🧱 Agent Builder Enhancements Agent Builder continues to receive major usability and functional upgrades: Conversations View for Foundry Agents: A dedicated Conversations tab is now available, making it easier to review and manage conversation history when working with Foundry agents. Auto-Save: Draft agents are now automatically saved before running in the playground, preventing any accidental loss of unsaved changes. Seamless Foundry Integration: Open Foundry prompt agents directly in Agent Builder from the Foundry extension. You can also generate and refine agent instructions using the Foundry Prompt Optimizer. MCP Tool Approval: Configure auto or manual approval for MCP tool calls in Agent Builder, giving you complete control over how tool invocations are handled during agent runs. View Code for Workspace Scaffolding: Added View Code support to scaffold a workspace for Foundry agents, letting you quickly generate the project structure needed to get started. 🧪 Evaluation Framework Updates Updated the VS Code Skill to use the pytest-agent-evals SDK for running agent evaluations, perfectly aligned with the latest evaluation framework. ✨ Wrapping up Version 0.32.0 is a big step forward for AI Toolkit. You can get started today with AI Toolkit: 📥 Download: Install the AI Toolkit from the Visual Studio Code Marketplace 📖 Learn: Explore our comprehensive AI Toolkit Documentation As always, we'd love your feedback—keep it coming, and happy agent building! 🚀 Whether it's a feature request, bug report, or feedback on your experience, join the conversation and contribute directly on our GitHub repository.1.6KViews0likes0CommentsFrom Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry
From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry Agentic AI is no longer a future concept — it’s quickly becoming the backbone of intelligent, action-oriented applications. But while it’s easy to prototype an AI agent, taking it all the way to production requires much more than a clever prompt. In this blog post - and the accompanying video tutorial - we walk through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready. The scenario: a retail agent for sales and inventory insights To make things concrete, the demo uses a fictional DIY and home‑improvement retailer called Zava. The objective is to build an AI agent that can assist the internal team in: Analyzing sales data (e.g. reason over a product catalog, identify top‑selling categories, etc.) Managing inventory (e.g. Detect products running low on stock, trigger restock actions, etc.) Chapter 1 (min 00:00 – 01:20): Model selection with GitHub Copilot and AI Toolkit The journey starts in Visual Studio Code, using GitHub Copilot together with the AI Toolkit. Instead of picking a model arbitrarily, we: Describe the business scenario in natural language Ask Copilot to perform a comparative analysis between two candidate models Define explicit evaluation criteria (reasoning quality, tool support, suitability for analytics) Copilot leverages AI Toolkit skills to explain why one model is a better fit than the other — turning model selection into a transparent, repeatable decision. To go deeper, we explore the AI Toolkit Model Catalog, which lets you: Browse hundreds of models Filter by hosting platform (GitHub, Microsoft Foundry, local) Filter by publisher (open‑source and proprietary) Once the right model is identified, we deploy it to Microsoft Foundry with a single click and validate it with test prompts. Chapter 2 (min 01:20 – 02:48): Rapid agent prototyping with Agent Builder UI With the model ready, it’s time to build the agent. Using the Agent Builder UI, we configure: The agent’s identity (name, role, responsibilities) Instructions that define tone, behavior, and scope The model the agent runs on The tools and data sources it can access For this scenario, we add: File search, grounded on uploaded sales logs and a product catalog Code interpreter, enabling the agent to compute metrics, generate charts, and write reports We can then test the agent in the right-side playground by asking business questions like: “What were the top three selling categories in 2025?” The response is not generic — it’s grounded in the retailer’s data, and you can inspect which tools and data were used to produce the answer. The Agent Builder also provides local evaluation and tracing functionalities. Chapter 3 (min 02:48 – 04:04): From UI prototype to hosted agent code UI-based prototyping is powerful, but real solutions often require custom logic. This is where we transition from prototype to production by using a built-in workflow to migrate from UI to a hosted agent template The result is a production-ready scaffold that includes: Agent code (built with Microsoft Agent Framework; you can choose between Python or C#) A YAML-based agent definition Container configuration files From here, we extend the agent with custom functions — for example, to create and manage restock orders. GitHub Copilot helps accelerate this step by adapting the template to the Zava business scenario. Chapter 4 (min 04:04 – 05:12): Local debugging and cloud deployment Before deploying, we test the agent locally: Ask it to identify products running out of stock Trigger a restock action using the custom function Debug the full tool‑calling flow end to end Once validated, we deploy the agent to Microsoft Foundry. By deploying the agent to the Cloud, we don’t just get compute power, but a whole set of built-in features to operationalize our solution and maintain it in production. Chapter 5 (min 05:12 – 08:04): Evaluation, safety, and monitoring in Foundry Production readiness doesn’t stop at deployment. In the Foundry portal, we explore: Evaluation runs, using both real and synthetic datasets LLM‑based judges that score responses across multiple metrics, with explanations Red teaming, where an adversarial agent probes for unsafe or undesired behavior Monitoring dashboards, tracking usage, latency, regressions, and cost across the agent fleet These capabilities make it possible to move from ad‑hoc testing to continuous quality and safety assessment. Why this workflow matters This end-to-end flow demonstrates a key idea: Agentic AI isn’t just about building agents — it’s about operating them responsibly at scale. By combining AI Toolkit in VS Code with Microsoft Foundry, you get: A smooth developer experience Clear separation between experimentation and production Built‑in evaluation, safety, and observability Resources Demo Sample: GitHub Repo Foundry tutorials: Inside Microsoft Foundry - YouTubeMicrosoft Agent Framework, Microsoft Foundry, MCP, Aspire를 활용한 실전 예제 만들기
AI 에이전트를 개발하는 것은 점점 쉬워지고 있습니다. 하지만 여러 서비스, 상태 관리, 프로덕션 인프라를 갖춘 실제 애플리케이션의 일부로 배포하는 것은 여전히 복잡합니다. 실제로 .NET 개발자 커뮤니티에서는 로컬 머신과 클라우드 네이티브 방식의 클라우드 환경 모두에서 실제로 동작하는 실전 예제에 대한 요구가 많았습니다. 그래서 준비했습니다! Microsoft Agent Framework과 Microsoft Foundry, MCP(Model Context Protocol), Aspire등을 어떻게 프로덕션 상황에서 조합할 수 있는지를 보여주는 오픈소스 Interview Coach 샘플입니다. AI 코치가 인성 면접 질문과 기술 면접 질문을 안내한 후, 요약을 제공하는 효율적인 면접 시뮬레이터입니다. 이 포스트에서는 어떤 패턴을 사용했고 해당 패턴이 해결할 수 있는 문제를 다룹니다. Interview Coach 데모 앱을 방문해 보세요. 왜 Microsoft Agent Framework을 써야 하나요? .NET으로 AI 에이전트를 구축해 본 적이 있다면, Semantic Kernel이나 AutoGen, 또는 두 가지 모두를 사용해 본 적이 있을 겁니다. Microsoft Agent Framework는 그 다음 단계로서, 각각의 프로젝트에서 효과적이었던 부분을 하나의 프레임워크로 통합했습니다. AutoGen의 에이전트 추상화와 Semantic Kernel의 엔터프라이즈 기능(상태 관리, 타입 안전성, 미들웨어, 텔레메트리 등)을 하나로 통합했습니다. 또한 멀티 에이전트 오케스트레이션을 위한 그래프 기반 워크플로우도 추가했습니다. 그렇다면 .NET 개발자에게 이것이 어떤 의미로 다가올까요? 하나의 프레임워크. Semantic Kernel과 AutoGen 사이에서 더 이상 고민할 필요가 없습니다. 익숙한 패턴. 에이전트는 의존성 주입, IChatClient , 그리고 ASP.NET 앱과 동일한 호스팅 모델을 사용합니다. 프로덕션을 위한 설계. OpenTelemetry, 미들웨어 파이프라인, Aspire 통합이 포함되어 있습니다. 멀티 에이전트 오케스트레이션. 순차 실행, 동시 실행, 핸드오프 패턴, 그룹 채팅 등 다양한 멀티 에이전트 오케스트레이션 패턴을 지원합니다. Interview Coach는 이 모든 것을 Hello World가 아닌 실제 애플리케이션에 적용합니다. 왜 Microsoft Foundry를 써야 하나요? AI 에이전트에는 모델 말고도 더 많은 무언가가 필요합니다. 우선 인프라가 필요하겠죠. Microsoft Foundry는 AI 애플리케이션을 구축하고 관리하기 위한 Azure 플랫폼이며, Microsoft Agent Framework의 권장 백엔드입니다. Foundry는 자체 포털에서 아래와 같은 내용을 제공합니다: 모델 액세스. OpenAI, Meta, Mistral 등의 모델 카탈로그를 하나의 엔드포인트로 제공합니다. 콘텐츠 세이프티. 에이전트가 벗어나지 않도록 기본으로 제공하는 콘텐츠 조정 및 PII 감지 기능이 있습니다. 비용 최적화 라우팅. 에이전트의 요청을 자동으로 최적의 모델로 라우팅합니다. 평가 및 파인튜닝. 에이전트 품질을 측정하고 시간이 지남에 따라 개선할 수 있습니다. 엔터프라이즈 거버넌스. Entra ID와 Microsoft Defender를 통한 ID, 액세스 제어, 규정 준수를 지원합니다. Interview Coach에서 Foundry는 에이전트를 구동하는 모델 엔드포인트를 제공합니다. 에이전트 코드가 IChatClient 인터페이스를 사용하기 때문에, Foundry는 LLM 선택을 위한 설정에 불과할 수도 있겠지만, 에이전트가 필요로 하는 가장 많은 도구를 기본적으로 제공하는 선택지입니다. Interview Coach는 무엇을 하나요? Interview Coach는 모의 면접을 진행하는 대화형 AI입니다. 이력서와 채용 공고를 제공하면, 에이전트가 나머지를 처리합니다: 접수. 이력서와 목표 직무 설명을 수집합니다. 행동 면접. 경험에 맞춘 STAR 기법 질문을 합니다. 기술 면접. 직무별 기술 질문을 합니다. 요약. 구체적인 피드백과 함께 성과 리뷰를 생성합니다. Blazor 웹 UI를 통해 실시간으로 응답 스트리밍을 제공하며 사용자와 에이전트간 상호작용합니다. 아키텍처 개요 애플리케이션은 Aspire를 통해 다양한 서비스를 오케스트레이션합니다: LLM 제공자. 다양한 모델 액세스를 위한 Microsoft Foundry (권장). WebUI. 면접 대화를 위한 Blazor 채팅 인터페이스. 에이전트. Microsoft Agent Framework로 구축된 면접 로직. MarkItDown MCP 서버. Microsoft의 MarkItDown을 통해 이력서(PDF, DOCX)를 마크다운으로 변환합니다. InterviewData MCP 서버. SQLite에 세션을 저장하는 .NET MCP 서버. Aspire가 서비스 디스커버리, 상태 확인, 텔레메트리를 처리합니다. 각 컴포넌트는 별도의 프로세스로 실행시키며, 하나의 커맨드 만으로 전체를 시작할 수 있습니다. 패턴 1: 멀티 에이전트 핸드오프 이 샘플에서 가장 흥미로운 부분이기도 한 핸드오프 패턴으로 멀티 에이전트 시나리오를 구성했습니다. 하나의 에이전트가 모든 것을 처리하는 대신, 면접은 다섯 개의 전문 에이전트로 나뉩니다: 에이전트 역할 도구 Triage 메시지를 적절한 전문가에게 라우팅 없음 (순수 라우팅) Receptionist 세션 생성, 이력서 및 채용 공고 수집 MarkItDown + InterviewData Behavioral Interviewer STAR 기법을 활용한 행동 면접 질문 진행 InterviewData Technical Interviewer 직무별 기술 질문 진행 InterviewData Summarizer 최종 면접 요약 생성 InterviewData 핸드오프 패턴에서는 하나의 에이전트가 대화의 전체 제어권을 다음 에이전트에게 넘깁니다. 그러면 넘겨 받는 에이전트가 모든 제어권을 인수합니다. 이는 주 에이전트가 다른 에이전트를 도우미로 호출하면서도 제어권을 유지하는 "agent-as-tools(도구로서의 에이전트)" 방식과는 다릅니다. 핸드오프 워크플로우를 어떻게 구성하는지 살펴보시죠: var workflow = AgentWorkflowBuilder .CreateHandoffBuilderWith(triageAgent) .WithHandoffs(triageAgent, [receptionistAgent, behaviouralAgent, technicalAgent, summariserAgent]) .WithHandoffs(receptionistAgent, [behaviouralAgent, triageAgent]) .WithHandoffs(behaviouralAgent, [technicalAgent, triageAgent]) .WithHandoffs(technicalAgent, [summariserAgent, triageAgent]) .WithHandoff(summariserAgent, triageAgent) .Build(); 면접 상황을 상상해 본다면 기본적으로 순차적인 방식으로 진행합니다: Receptionist → Behavioral → Technical → Summarizer. 각 전문가가 직접 다음으로 핸드오프합니다. 예상치 못한 상황이 발생하면, 에이전트는 재라우팅을 위해 Triage로 돌아갑니다. 이 샘플에는 더 간단한 배포를 위한 단일 에이전트 모드도 포함하고 있어, 두 가지 접근 방식을 나란히 비교할 수 있습니다. 패턴 2: 도구 통합을 위한 MCP 이 프로젝트에서 도구는 에이전트 내부에 구현하는 대신 MCP(Model Context Protocol) 서버를 통해 통합합니다. 동일한 MarkItDown 서버가 완전히 다른 에이전트 프로젝트에서도 쓰일 수 있으며, 도구 개발팀은 에이전트 개발팀과 독립적으로 배포할 수 있습니다. MCP는 또한 언어에 구애받지 않으므로, 이 샘플 앱에서 쓰인 MarkItDown은 Python 기반의 서버이고, 에이전트는 .NET 기반으로 동작합니다. 에이전트는 시작 시 MCP 클라이언트를 통해 도구를 발견하고, 적절한 에이전트에게 전달합니다: var receptionistAgent = new ChatClientAgent( chatClient: chatClient, name: "receptionist", instructions: "You are the Receptionist. Set up sessions and collect documents...", tools: [.. markitdownTools, .. interviewDataTools]); 각 에이전트는 필요한 도구만 받습니다. Triage는 도구를 받지 않고(라우팅만 수행), 면접관은 세션 액세스를, Receptionist는 문서 파싱과 세션 액세스를 받습니다. 이는 최소 권한 원칙을 따릅니다. 패턴 3: Aspire 오케스트레이션 Aspire가 모든 것을 하나로 연결합니다. 앱 호스트는 서비스 토폴로지를 정의합니다: 어떤 서비스가 존재하는지, 서로 어떻게 의존하는지, 어떤 구성을 받는지. 다음을 제공합니다: 서비스 디스커버리. 서비스가 하드코딩된 URL이 아닌 이름으로 서로를 찾습니다. 상태 확인. Aspire 대시보드에서 모든 컴포넌트의 상태를 보여줍니다. 분산 추적. 공유 서비스 기본값을 통해 OpenTelemetry가 연결됩니다. 단일 커맨드 시작. aspire run --file ./apphost.cs 로 모든 것을 시작합니다. 배포 시, azd up 으로 전체 애플리케이션을 Azure Container Apps에 푸시합니다. 시작하기 사전 요구 사항 .NET 10 SDK 이상 Azure 구독 Microsoft Foundry 프로젝트 Docker Desktop 또는 기타 컨테이너 런타임 로컬에서 실행하기 git clone https://github.com/Azure-Samples/interview-coach-agent-framework.git cd interview-coach-agent-framework # 자격 증명 구성 dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:Endpoint "<your-endpoint>" dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:ApiKey "<your-key>" # 모든 서비스 시작 aspire run --file ./apphost.cs Aspire 대시보드를 열고, 모든 서비스가 Running으로 표시될 때까지 기다린 후, WebUI 엔드포인트를 클릭하여 모의 면접을 시작하세요. 핸드오프 패턴이 어떻게 동작하는지 DevUI에서 시각화한 모습입니다. 이 채팅 UI를 사용하여 면접 후보자로서 에이전트와 상호작용할 수 있습니다. Azure에 배포하기 azd auth login azd up 배포를 위해서는 이게 사실상 전부입니다! Aspire와 azd 가 나머지를 처리합니다. 배포와 테스트를 완료한 후, 다음 명령어를 실행하여 모든 리소스를 안전하게 삭제할 수 있습니다: azd down --force --purge 이 샘플에서 배울 수 있는 것 Interview Coach를 통해 다음을 경험하게 됩니다: Microsoft Foundry를 모델 백엔드로 사용하기 Microsoft Agent Framework로 단일 에이전트 및 멀티 에이전트 시스템 구축하기 핸드오프 오케스트레이션으로 전문 에이전트 간 워크플로우 분할하기 에이전트 코드와 독립적으로 MCP 도구 서버 생성 및 사용하기 Aspire로 멀티 서비스 애플리케이션 오케스트레이션하기 일관되고 구조화된 동작을 생성하는 프롬프트 작성하기 azd up 으로 모든 것 배포하기 사용해 보세요 전체 소스 코드는 GitHub에 있습니다: Azure-Samples/interview-coach-agent-framework Microsoft Agent Framework가 처음이라면, 프레임워크 문서와 Hello World 샘플부터 시작하세요. 그런 다음 여기로 돌아와서 더 큰 프로젝트에서 각 부분이 어떻게 결합되는지 확인하세요. 이러한 패턴으로 무언가를 만들었다면, 이슈를 열어 알려주세요. 다음 계획 다음과 같은 통합 시나리오를 현재 작업 중입니다. 작업이 끝나는 대로 이 샘플 앱을 업데이트 하도록 하겠습니다. Microsoft Foundry Agent Service GitHub Copilot A2A 참고 자료 Microsoft Agent Framework 문서 Microsoft Agent Framework 프리뷰 소개 Microsoft Agent Framework, 릴리스 후보 도달 Microsoft Foundry 문서 Microsoft Foundry Agent Service Microsoft Foundry 포털 Microsoft.Extensions.AI Model Context Protocol 사양 Aspire 문서 ASP.NET Blazor