azure ai foundry
144 TopicsBYO Thread Storage in Azure AI Foundry using Python
Build scalable, secure, and persistent multi-agent memory with your own storage backend As AI agents evolve beyond one-off interactions, persistent context becomes a critical architectural requirement. Azure AI Foundry’s latest update introduces a powerful capability — Bring Your Own (BYO) Thread Storage — enabling developers to integrate custom storage solutions for agent threads. This feature empowers enterprises to control how agent memory is stored, retrieved, and governed, aligning with compliance, scalability, and observability goals. What Is “BYO Thread Storage”? In Azure AI Foundry, a thread represents a conversation or task execution context for an AI agent. By default, thread state (messages, actions, results, metadata) is stored in Foundry’s managed storage. With BYO Thread Storage, you can now: Store threads in your own database — Azure Cosmos DB, SQL, Blob, or even a Vector DB. Apply custom retention, encryption, and access policies. Integrate with your existing data and governance frameworks. Enable cross-region disaster recovery (DR) setups seamlessly. This gives enterprises full control of data lifecycle management — a big step toward AI-first operational excellence. Architecture Overview A typical setup involves: Azure AI Foundry Agent Service — Hosts your multi-agent setup. Custom Thread Storage Backend — e.g., Azure Cosmos DB, Azure Table, or PostgreSQL. Thread Adapter — Python class implementing the Foundry storage interface. Disaster Recovery (DR) replication — Optional replication of threads to secondary region. Implementing BYO Thread Storage using Python Prerequisites First, install the necessary Python packages: pip install azure-ai-projects azure-cosmos azure-identity Setting Up the Storage Layer from azure.cosmos import CosmosClient, PartitionKey from azure.identity import DefaultAzureCredential import json from datetime import datetime class ThreadStorageManager: def __init__(self, cosmos_endpoint, database_name, container_name): credential = DefaultAzureCredential() self.client = CosmosClient(cosmos_endpoint, credential=credential) self.database = self.client.get_database_client(database_name) self.container = self.database.get_container_client(container_name) def create_thread(self, user_id, metadata=None): """Create a new conversation thread""" thread_id = f"thread_{user_id}_{datetime.utcnow().timestamp()}" thread_data = { 'id': thread_id, 'user_id': user_id, 'messages': [], 'created_at': datetime.utcnow().isoformat(), 'updated_at': datetime.utcnow().isoformat(), 'metadata': metadata or {} } self.container.create_item(body=thread_data) return thread_id def add_message(self, thread_id, role, content): """Add a message to an existing thread""" thread = self.container.read_item(item=thread_id, partition_key=thread_id) message = { 'role': role, 'content': content, 'timestamp': datetime.utcnow().isoformat() } thread['messages'].append(message) thread['updated_at'] = datetime.utcnow().isoformat() self.container.replace_item(item=thread_id, body=thread) return message def get_thread(self, thread_id): """Retrieve a complete thread""" try: return self.container.read_item(item=thread_id, partition_key=thread_id) except Exception as e: print(f"Thread not found: {e}") return None def get_thread_messages(self, thread_id): """Get all messages from a thread""" thread = self.get_thread(thread_id) return thread['messages'] if thread else [] def delete_thread(self, thread_id): """Delete a thread""" self.container.delete_item(item=thread_id, partition_key=thread_id) Integrating with Azure AI Foundry from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential class ConversationManager: def __init__(self, project_endpoint, storage_manager): self.ai_client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str=project_endpoint ) self.storage = storage_manager def start_conversation(self, user_id, system_prompt): """Initialize a new conversation""" thread_id = self.storage.create_thread( user_id=user_id, metadata={'system_prompt': system_prompt} ) # Add system message self.storage.add_message(thread_id, 'system', system_prompt) return thread_id def send_message(self, thread_id, user_message, model_deployment): """Send a message and get AI response""" # Store user message self.storage.add_message(thread_id, 'user', user_message) # Retrieve conversation history messages = self.storage.get_thread_messages(thread_id) # Call Azure AI with conversation history response = self.ai_client.inference.get_chat_completions( model=model_deployment, messages=[ {"role": msg['role'], "content": msg['content']} for msg in messages ] ) assistant_message = response.choices[0].message.content # Store assistant response self.storage.add_message(thread_id, 'assistant', assistant_message) return assistant_message Usage Example # Initialize storage and conversation manager storage = ThreadStorageManager( cosmos_endpoint="https://your-cosmos-account.documents.azure.com:443/", database_name="conversational-ai", container_name="threads" ) conversation_mgr = ConversationManager( project_endpoint="your-project-connection-string", storage_manager=storage ) # Start a new conversation thread_id = conversation_mgr.start_conversation( user_id="user123", system_prompt="You are a helpful AI assistant." ) # Send messages response1 = conversation_mgr.send_message( thread_id=thread_id, user_message="What is machine learning?", model_deployment="gpt-4" ) print(f"AI: {response1}") response2 = conversation_mgr.send_message( thread_id=thread_id, user_message="Can you give me an example?", model_deployment="gpt-4" ) print(f"AI: {response2}") # Retrieve full conversation history history = storage.get_thread_messages(thread_id) for msg in history: print(f"{msg['role']}: {msg['content']}") Key Highlights: Threads are stored in Cosmos DB under your control. You can attach metadata such as region, owner, or compliance tags. Integrates natively with existing Azure identity and Key Vault. Disaster Recovery & Resilience When coupled with geo-replicated Cosmos DB or Azure Storage RA-GRS, your BYO thread storage becomes resilient by design: Primary writes in East US replicate to Central US. Foundry auto-detects failover and reconnects to secondary region. Threads remain available during outages — ensuring operational continuity. This aligns perfectly with the AI-First Operational Excellence architecture theme, where reliability and observability drive intelligent automation. Best Practices Area Recommendation Security Use Azure Key Vault for credentials & encryption keys. Compliance Configure data residency & retention in your own DB. Observability Log thread CRUD operations to Azure Monitor or Application Insights. Performance Use async I/O and partition keys for large workloads. DR Enable geo-redundant storage & failover tests regularly. When to Use BYO Thread Storage Scenario Why it helps Regulated industries (BFSI, Healthcare, etc.) Maintain data control & audit trails Multi-region agent deployments Support DR and data sovereignty Advanced analytics on conversation data Query threads directly from your DB Enterprise observability Unified monitoring across Foundry + Ops The Future BYO Thread Storage opens doors to advanced use cases — federated agent memory, semantic retrieval over past conversations, and dynamic workload failover across regions. For architects, this feature is a key enabler for secure, scalable, and compliant AI system design. For developers, it means more flexibility, transparency, and integration power. Summary Feature Benefit Custom thread storage Full control over data Python adapter support Easy extensibility Multi-region DR ready Business continuity Azure-native security Enterprise-grade safety Conclusion Implementing BYO thread storage in Azure AI Foundry gives you the flexibility to build AI applications that meet your specific requirements for data governance, performance, and scalability. By taking control of your storage, you can create more robust, compliant, and maintainable AI solutions.21Views0likes0CommentsBuilding Adaptive Multilingual Apps Using TypeScript and Azure AI Translator API
Learn how to integrate Azure AI Translator 2025-05-01-preview into TypeScript apps using hybrid Neural Machine Translation (NMT) + Large Language Model (LLM) translation, tone & gender controls, adaptive reference pairs, and a migration path from v3.Real-Time Speech Intelligence for Global Scale: gpt-4o-transcribe-diarize in Azure AI Foundry
Voice is a natural interface for communication. Now, with the general availability of gpt-4o-transcribe-diarize, the new automatic speech recognition (ASR) model in Azure AI Foundry, transforming speech into actionable text is faster, smarter, and more accurate than ever. This launch marks a significant milestone in our mission to empower organizations with AI that delivers speed, accuracy, and enterprise-grade reliability. With gpt-4o-transcribe-diarize seamlessly integrated, businesses can unlock critical insights from conversations, instantly converting audio into text with ultra-low latency and outstanding accuracy across 100+ languages. Whether you're enhancing live event accessibility, analyzing customer interactions, or enabling intelligent voice-driven applications, gpt-4o-transcribe-diarize helps capture spoken word and leverages it for real-time decision-making. Experience how Azure AI’s innovation in speech technology is helping to redefine productivity and global reach, setting a new standard for audio intelligence in the enterprise landscape. Why gpt-4o-transcribe-diarize Matters Businesses today operate in a world where conversations drive decisions. From customer support calls to virtual meetings, audio data holds critical insights. Gpt-4o-transcribe-diarize unlocks these insights, converting speech to text with ultra-low latency and high accuracy across 100+ languages. Whether you’re captioning live events, analyzing call center interactions, or building voice-driven applications, gpt-4o-transcribe-diarize offers the opportunity to help your workflows be powered by real-time intelligence. Key Features Lightning-Fast Transcription: Convert 10 minutes of audio in ~15 seconds with our new Fast Transcription API. Global Language Coverage: Support for 100+ languages and dialects for inclusive, global experiences. Seamless Integration: Available in Azure AI Foundry with managed endpoints for easy deployment and scale. Real-World Impact Imagine a reporter summarizing interviews in real time, a financial institution transcribing calls instantly, or a global retailer powering multilingual voice assistants; all with the speed and security of Azure AI Foundry. gpt-4o-transcribe-diarize can make these scenarios possible today. Pricing and regional availability for gpt-4o-transcribe-diarize Model Deployment Regions Price $/1m tokens gpt-4o-transcribe-diarize Global Standard (Paygo) East US 2, Sweden Central Text input: $2.50 Audio input: $6.00 Output: $10.00 gpt-4o-transcribe-diarize in audio AI innovation context gpt-4o-transcribe-diarize is part of a broader wave of audio AI innovation on Azure, joining new models like OpenAI gpt-realtime and gpt-audio that are purpose-built for expressive, low-latency voice experiences. While gpt-4o-transcribe-diarize delivers ultra-fast transcription with enterprise-grade accuracy, gpt-realtime enables natural, emotionally rich voice interactions with millisecond responsiveness—ideal for live conversations, voice agents, and multimodal applications. Meanwhile, audio models like gpt-4o-transcribe mini, and mini-tts extend the platform’s capabilities with customizable speech synthesis and real-time captioning, making Azure AI a comprehensive solution for building intelligent, production-ready voice systems. gpt-realtime Features OpenAI claims the gpt-realtime model introduces a new standard for voice-first applications, combining expressive audio generation with ultra-low latency and natural conversational flow. It’s designed to power real-time interactions that feel like natural, responsive speech. Key Features: Millisecond Latency: Enables live responsiveness suitable for real-time conversations, kiosks, and voice agents. Emotionally Expressive Voices: Supports nuanced speech delivery with voices like Marin and Cedar, capable of conveying tone, emotion, and intent. Natural Turn-Taking: Built-in mechanisms for detecting pauses and transitions, allowing fluid back-and-forth dialogue. Function Calling Support: Seamlessly integrates with backend systems to trigger actions based on voice input. Multimodal Readiness: Designed to work with text, audio, and visual inputs for rich, interactive experiences. Stable APIs for Production: Enterprise-grade reliability with consistent behavior across sessions and deployments. These features make gpt-realtime a foundational model for building intelligent voice interfaces that go beyond transcription—delivering conversational intelligence in real time. gpt-realtime Use Cases With its expressive audio capabilities and real-time responsiveness, gpt-realtime unlocks new possibilities across industries. Whether enhancing customer engagement or streamlining operations, it brings voice AI into the heart of enterprise workflows. Examples include: Customer Service Agents: Power virtual agents that respond instantly with natural, tones for rich expressiveness, improving customer satisfaction and reducing wait times. Retail Kiosks & Smart Devices: Enable voice-driven product discovery, troubleshooting, and checkout experiences with real-time feedback. Multilingual Voice Assistants: Deliver localized, expressive voice experiences across global markets with support for multiple languages and dialects. Live Captioning & Accessibility: Combine gpt-4o-transcribe-diarize gpt-realtime to provide real-time captions and voice synthesis for inclusive experiences. These use cases demonstrate how gpt-realtime transforms voice into a strategic interface—bridging human communication and intelligent systems with speed and accuracy. Ready to transform voice into value? Learn more and start building with gpt-4o-transcribe-diarize2.5KViews0likes1CommentConnect AI Agent via postman
I'm having the hardest time trying to connect to my custom agent (Agent_id: asst_g8DVMGAOLiXXk7WmiTCMQBgj) via Postman. I'm able to authenticate fine, and receive the sequre token which I'm able to run my deployment post with no issues (https://aiagentoverview.cognitiveservices.azure.com/openai/deployments/gpt-4.1/chat/completions?api-version=2025-01-01-preview). But how do I run a post to my agent_id: asst_g8DVMGAOLiXXk7WmiTCMQBgj? I cant find any instructions anywhere.27Views0likes2CommentsDeployment Guide-Copilot Studio agent with MCP Server exposed by API Management using OAuth 2.0
Introduction In today’s enterprise landscape, enabling AI agents to interact with backend systems securely and at scale is critical. By exposing MCP servers through Azure API Management (APIM), organizations can provide controlled access to these services. When combined with OAuth 2.0 authorization code flow, this setup ensures robust, enterprise-grade security for AI agents built in Copilot Studio—empowering intelligent automation while maintaining strict access governance. Disclaimer & Caveats This article explores how to configure a MCP tool—exposed as a MCP server via APIM—for secure consumption by AI agents built in Copilot Studio. Leveraging the OAuth 2.0 Authorization Code Flow, this setup ensures enterprise-grade security by enabling delegated access without exposing user credentials. With Azure API Management now supporting MCP server capabilities in public preview, developers can expose REST APIs as MCP tools using a standardized JSON-RPC interface. This allows AI agents to invoke backend services securely and scalable, without the need to rebuild existing APIs. Copilot Studio, also in preview for MCP integration, empowers organizations to orchestrate intelligent agents that interact with these tools in real time. While this guide provides a foundational approach, every environment is unique. You can enhance security further by implementing app roles, conditional access policies, and extending your integration logic with custom Python code for advanced scenarios. ⚠️ Note: Both MCP server support in APIM and MCP tool integration in Copilot Studio are currently in public preview. As these platforms evolve rapidly, expect changes and improvements over time. Always refer to the https://learn.microsoft.com/en-us/azure/api-management/export-rest-mcp-server for the latest updates. This article is about consuming remote MCP servers. In Azure, managed identity can also be leveraged for APIM integration. What is Authorization Code Flow? The Authorization Code Flow is designed for applications that can securely store a client secret (like server-side apps). It allows the app to obtain an access token on behalf of the user without exposing their credentials. This flow uses an intermediate authorization code to exchange for tokens, adding an extra layer of security. Steps in the Flow User Authentication The user is redirected to the Authorization Server (In this case: Azure AD) to log in and grant consent. Authorization Code Issued After successful login, the Authorization Server sends an authorization code to the app via the redirect URI. Token Exchange The app sends the authorization code (plus client credentials) to the Token Endpoint to get: Access Token (for API calls) and Refresh Token (to renew access without user interaction) API Access The app uses the Access Token to call protected resources. Below diagram shows the Authorization code flow in detail. Press enter or click to view image in full size Microsoft identity platform and OAuth 2.0 authorization code flow — Microsoft identity platform | Microsoft Learn High Level Architecture Press enter or click to view image in full size This architecture can also be implemented with APIM backend app registration only. However, stay cautious in configuring redirect URIs appropriately. Remote MCP Servers using APIM Architecture APIM exposing Remote MCP servers, enabling AI agents—such as those built in Copilot Studio—to securely access backend services using standardized JSON-RPC interfaces. This integration offers a robust, scalable, and secure way to connect AI tools with enterprise APIs. Key Capabilities: Secure Gateway: APIM acts as an intelligent gateway, handling OAuth 2.0 Authorization Code Flow, authentication, and request routing. Monitoring & Observability: Integration with Azure Log Analytics and Application Insights enables deep visibility into API usage, performance, and errors. Policy Enforcement: APIM’s policy engine allows for custom rules, including token validation, header manipulation, and response transformation. Rate Limiting & Throttling: Built-in support for rate limits, quotas, and IP filtering helps protect backend services from abuse and ensures fair usage. Managed Identity & Entra ID: Secure service-to-service communication is enabled via system-assigned and user-assigned managed identities, with Entra ID handling identity and access management. Flexible Deployment: MCP servers can be hosted in Azure Functions, App Services, or Container Apps, and exposed via APIM with minimal changes to existing APIs. To learn more, visit https://learn.microsoft.com/en-us/samples/azure-samples/remote-mcp-apim-functions-python/remote-mcp-apim-functions-python/ Develop MCP server in VS Code This deployment guide provides sample MCP code written in python for ease of use. It is available on the following GitHub repo. However, you can also use your own MCP server. Clone the following repository and open in VS Code. git clone https://github.com/mafzal786/mcp-server.git Run the following to execute it locally. cd mcp-server uv venv uv sync uv run mcpserver.py Deploy MCP Server as Azure Container App In this deployment guide, MCP server is deployed in Azure Container App. It can also be deployed as Azure App service. Deploy the MCP server in Azure container App by running the following command. It can be deployed by many other various ways such as via VS Code or CI/CD pipeline. AZ Cli is used for simplicity. az containerapp up \ --resource-group <RESOURCE_GROUP_NAME> \ --name streamable-mcp-server2 \ --environment mcp \ --location <REGION> \ --source . Configure Authentication for Azure Container App 1. Sign in Azure portal. Visit the container App in Azure and Click “Authentication” as shown below. Press enter or click to view image in full size For more details, visit the following link: Enable authentication and authorization in Azure Container Apps with Microsoft Entra ID | Microsoft Learn Click Add Identity Provider as shown. 2. Select Microsoft from the drop down and leave everything as is as shown below. 3. This will create a new app registration for the container App. After it is all setup, it will look like as below. As soon as authentication is configured. it will make container app inaccessible except for OAuth. Note: If you have app registration for Azure Container App already configured, use that by selecting "pick an existing app registration in this directory" option. Review App Registration of Container App — Backend Visit App registration and click streamable-mcp-server2 as in this case. Click on Authentication tab. Verify the Redirect URIs. you should see a redirect URL for container app. URI will end with /.auth/login/aad/callback as shown in the green box in the below screenshot. Now click on “Expose an API”. Confirm Application ID URI is configured with scope as shown below. its format is api://<client id> Scope "user_impersonation" is created. Verify API Permission. Make sure you Grant admin consent for your tenant as shown below. More scope can be created depending on the requirement of data access. Note: Make sure to "Grant admin consent" before proceeding to next step. Create App registration for representing APIM API Lauch Azure Portal. Visit App registration. Click New registration. Create a new App registration as shown below. For example, "apim-mcp-backend-api" in this case. Click "Expose an API", configure Application ID URI, and add a scope as shown in the below diagram such as user_impersonation. Click "App roles" and create the following role as shown below. More roles can be created depending on the requirements and case by case basis. Here app roles are created to get the concept around it and how it will be used in APIM inbound policies in the coming sections. Create App Registration for Client App — Copilot Studio In these steps, we will be configuring app registration for the client app, such as copilot studio in this case acting as a client app. This is also mentioned in the “high level architecture” diagram in the earlier section of this article. Lauch Azure Portal. Visit App registration. Click New registration. Create a new App registration. leave the Redirect URL as of now, we will configure it later as it is provided by copilot studio when configuring custom MCP connector. 3. Click on API permission and click “Add a permission”. Click Microsoft Graph and then click “Delegated permissions”. Select email, openid, profile as shown below. 4. Make sure to Grant admin consent and it should look like as below. 5. Create a secret. click “Certificates & secrets”. Create a new client secret by clicking “New client secret”. store the value as it will be masked after some time. if that happens, you can always delete and re-create a new secret. 6. Capture the following as you would need it in configuring MCP tool in Copilot Studio. Client ID from the Overview Tab of app registration. Client secret from “Certificates & secrets” tab. 7. Configure API permissions for APIM API i.e. "apim-mcp-backend-api" in this case. Click “API permissions” tab. Click “Add a permission”. Click on “My APIs” tab as shown below and select "apim-mcp-backend-api". Note: If you don't see the app registration in "My APIs". Go to App registration. Click "Owners". Add your AD account as Owners. 8. Select "Delegated permissions". Then select the permission as shown below. 9. Select the Application permission. Select the App roles created in the apim-mcp-backend-api registration. Such as mcp.read in this case. You MUST “Grant admin consent” as final step. It is very important!!! I can’t emphasize more on that. without it, nothing will work!!! 10. End result of this client app registration should look like as mentioned in the below figure. Configure permissions for Container App registration Lauch Azure Portal. Visit app registration. Select app registration of Azure container app such as streamable-mcp-server2 in this case. Select API permissions. Add the following delegated and application permissions as shown in the below diagram. Note: Don't forget to Grant admin consent. Configure allowed token audience for Container App It defines which audience values (aud claim) in a token are considered valid for your app. When a client app requests an access token from Microsoft Entra ID (Azure AD), the token includes an aud claim that identifies the intended recipient. Your container app will only accept tokens where the aud claim matches one of the values in the Allowed Token Audiences list. This is important as it ensures that only tokens issued for your API or app are accepted and prevents misuse of tokens intended for other resources. This adds extra layer of security. In the Azure Portal, visit Azure Container App. i.e. streamable-mcp-server2. Click on "Authentication" Click "Edit" under identity provider Under "Allowed token audiences", add the application ID URI of "apim-mcp-backend-api". As this will be included as an audience in the access token. Best Practices Only include trusted client app IDs. Avoid using overly broad values like “allow all” (not recommended). Validate tokens using Microsoft libraries (MSAL) or built-in auth features. Configure MCP server in API Management Note: Provisioning an API Management resource is outside the scope of this document. If you do not already have an API Management instance, follow this QuickStart: https://learn.microsoft.com/en-us/azure/api-management/get-started-create-service-instance The following service tiers are available for preview: Classic Basic, Standard, Premium, and Basic v2, Standard v2, Premium v2. For the Classic Basic, Standard, or Premium tiers, you must join the AI Gateway Early Update group to enable MCP server features. Please allow up to 2 hours for the update to take effect. Expose an existing MCP server Follow these steps to expose an existing MCP server is API Management: In the Azure portal, navigate to your API Management instance. In the left-hand menu, under APIs, select MCP servers > + Create MCP server. Select Expose an existing MCP server. In Backend MCP server: Enter the existing MCP server base URL. Example: https://streamable-mcp-serverv2.kdhg489457dslkjgn,.eastus2.azurecontainerapps.io/mcpfor the Microsoft Azure Container App hosting MCP server. In Transport type, Streamable HTTP is selected by default. In New MCP server: Enter a Name the MCP server in API Management. In Base path, enter a route prefix for tools. Example: mcptools Optionally, enter a Description for the MCP server. Select Create. Below diagram shows the MCP servers configured in APIM for reference. Configure policies for MCP server Configure one or more API Management policies to help manage the MCP server. The policies are applied to all API operations exposed as tools in the MCP server and can be used to control access, authentication, and other aspects of the tools. To configure policies for the MCP server: In the Azure portal, navigate to your API Management instance. In the left-hand menu, under APIs, select MCP Servers. Select an MCP server from the list. In the left menu, under MCP, select Policies. In the policy editor, add or edit the policies you want to apply to the MCP server's tools. The policies are defined in XML format. <!-- - Policies are applied in the order they appear. - Position <base/> inside a section to inherit policies from the outer scope. - Comments within policies are not preserved. --> <!-- Add policies as children to the <inbound>, <outbound>, <backend>, and <on-error> elements --> <policies> <!-- Throttle, authorize, validate, cache, or transform the requests --> <inbound> <base /> <set-variable name="accessToken" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "").Replace("Bearer ", ""))" /> <!-- Log the captured access token to the trace logs --> <trace source="Access Token Debug" severity="information"> <message>@("Access Token: " + (string)context.Variables["accessToken"])</message> </trace> <set-variable name="userId" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["oid"].FirstOrDefault())" /> <set-variable name="userName" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["name"].FirstOrDefault())" /> <trace source="User Name Debug" severity="information"> <message>@("username: " + (string)context.Variables["userName"])</message> </trace> <set-variable name="scp" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["scp"].FirstOrDefault())" /> <trace source="Scope Debug" severity="information"> <message>@("scope: " + (string)context.Variables["scp"])</message> </trace> <set-variable name="roles" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["roles"].FirstOrDefault())" /> <trace source="Role Debug" severity="information"> <message>@("Roles: " + (string)context.Variables["roles"])</message> </trace> <!-- <set-variable name="requestBody" value="@{ return context.Request.Body.As<string>(preserveContent:true); }" /> <trace source="Request Body information" severity="information"> <message>@("Request body: " + (string)context.Variables["requestBody"])</message> </trace> --> <validate-azure-ad-token tenant-id="{{tenant-id}}" header-name="Authorization" failed-validation-httpcode="401" failed-validation-error-message="Unauthorized. Access token is missing or invalid."> <client-application-ids> <application-id>{{client-application-id}}</application-id> </client-application-ids> <audiences> <audience>{{audience}}</audience> </audiences> <required-claims> <claim name="roles" match="any"> <value>mcp.read</value> </claim> </required-claims> </validate-azure-ad-token> </inbound> <!-- Control if and how the requests are forwarded to services --> <backend> <base /> </backend> <!-- Customize the responses --> <outbound> <base /> </outbound> <!-- Handle exceptions and customize error responses --> <on-error> <base /> <trace source="Role Debug" severity="error"> <message>@("username: " + (string)context.Variables["userName"] + " has error in accessing the MCP server, could be auth or role related...")</message> </trace> <return-response> <set-status code="403" reason="Forbidden" /> <set-body> {"error":"Missing required scope or role"} </set-body> </return-response> </on-error> </policies> Note: Update the above inbound policy with the tenant Id, client application id, and audience as per your environment. It is recommended to use APIM "Named values" instead of hard coding inside the policy. To learn more, visit Use named values in Azure API Management policies Configure Diagnostics for APIM In this solution, APIM diagnostics are configured to forward log data to Log Analytics. Testing and validation will be carried out using insights from Log Analytics. Note: Setting up diagnostics is outside the scope of this article. However, you can visit the following link for more information. https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-azure-monitor Below diagram shows what Logs are being sent to Log Analytics workspace. MCP Tool configuration in Copilot Studio Lauch copilot studio at https://copilotstudio.microsoft.com/. Configuration of environment and agent is beyond the scope of this article. It is assumed, you already have environment setup and agent has been created. Following link will help you, how to create an agent in copilot studio. Quickstart: Create and deploy an agent — Microsoft Copilot Studio | Microsoft Learn Inside agent configuration, click "Add tool". 3. Click on New tool. 4. Select Model Context Protocol. 5. Provide all relevant information for MCP server. Make sure your server URL ends with your mcp setup. In this case, it is APIM MCP server URL, with base path configured in APIM in the end. Provide server name and server description. Select OAuth 2.0 radio button. 6. Provide the following in the OAuth 2.0 section Client ID of client app registration. In this case, copilot-studio-client as configured earlier. Client secret of copilot-studio-client app registration. Authorization URL: https://login.microsoftonline.com/common/oauth2/v2.0/authorize Token URL template & Refresh URL: https://login.microsoftonline.com/oauth2/v2.0/token Scopes: openid, profile, email — which we selected earlier for Microsoft Azure Graph permissions. Click “Create”. This will provide you Redirect URL. you need to configure the redirect URL in client app registration. In this case, it is copilot-agent-client. Configure Redirect URI in Client App Registration Visit client app registration. i.e. copilot-studio-client. Click Authentication Tab and provide the Web Redirect URIs as shown below. Note: Configure Redirect URIs MUST be configured in app registration. Otherwise, authorization will not complete and sign on will fail. Configure redirect URI in APIM API app registration Also configure apim-mcp-backend-api app registration with the same redirect URI as shown below. Modify MCP connector in PowerApps Now visit the https://make.powerapps.com and open the newly created connector as shown below. Select the security tab and modify the Resource URL with application ID URI of apim-mcp-backend-api configured earlier in app registration for expose an API. Add .default in the scope. Provide the secret of client app registration as it will not let you update the connector. This is extra security measure for updating the connector in Powerapps. Click Update connector. CORS Configuration CORS configuration is a MUST!!! Since our Azure Container App is a remote MCP server with totally different domain or origin. Power Apps and CORS for External Domains — Brief Overview When embedding or integrating Power Apps with external web applications or APIs, Cross-Origin Resource Sharing (CORS) becomes a critical consideration. CORS is a browser security feature that restricts web pages from making requests to a different domain than the one that served the page, unless explicitly allowed. Key Points: Power Apps hosted on *.powerapps.com or within Microsoft 365 domains will block calls to external APIs unless those APIs include the proper CORS headers. The external API must return: Access-Control-Allow-Origin: https://apps.powerapps.com (or * for all origins, though not recommended for production) Access-Control-Allow-Methods: GET, POST, OPTIONS (or as needed) Access-Control-Allow-Headers: Content-Type, Authorization (and any custom headers) If the API requires authentication (e.g., OAuth 2.0), ensure preflight OPTIONS requests are handled correctly. For scenarios where you cannot modify the external API, consider using: Power Automate flows as a proxy Azure API Management or Azure Functions to inject CORS headers Always validate security implications before enabling wide-open CORS. If the CORS are not setup. You will encounter following error in copilot studio after pressing F12 (Browser Developer) CORS policy — blocking the container app Azure container app provides very efficient way of configuring CORS in the Azure portal. Lauch Azure Portal. Visit Azure container app i.e. streamable-mcp-server2 in this case. Click on CORS under Networking section. Configure the following in Allowed Origin Section as shown below. localhost is added to make it work from local laptop, although it is not required for Copilot Studio. 4. Click on “Allowed Method” tab and provide the following. 5. Provide wild card “*” in “Allowed Headers”tab. Although, it is not recommended for production system. it is done for the sake for simplicity. Configure that for added security 6. Click “Apply”. This will configure CORS for remote application. Test the MCP custom connector We are in the final stages of configuring the connector. It is time to test it, if everything is configured correctly and works. Launch the http://make.powerapps.com and click on “Custom connectors”, select your configured connector and click “5. Test” tab as shown below. You will see Selected Connection as blank if you are running it first time. Click “+ New connection” 2. New connection will launch the Authorization flow and browser dialog will pop up for making a request for authorization code. 3. Click “Create”. 4. Complete the login process. This will create a successful connection. 5. Click “Test operation”. If the response is 406 means everything is configured correctly as shown below. Solution validation Add user in Enterprise Application for App roles Roles have been defined under the required claims in the APIM inbound policy and also configured in the apim-mcp-backend-api app registration. As a result, any request from Copilot Studio will be denied if this role is not properly assigned. This role is included in the JWT access token, which we will validate in the following sections. To assign role, perform the following steps. Visit Azure Portal. Visit Enterprise Application. Select APIM backend app registration. In this case for example, apim-mcp-backend-api Click "Users and groups" Select "Add user/group" 5. Select User or Group who should have access to the role. 6. Click "Assign". It will look like as below. Note: Role assignment for users or groups is an important step. If it is not configured, MCP server tests will fail in Copilot studio. Test MCP server in Copilot Studio Lauch copilot studio and click on the Agent you created in earlier steps and click on “Tools tab”. Select your MCP tool as shown the following figure. Make sure it is “Enabled” if you have other tools attached to the same agent, disable them for now for testing. Make sure you have connection available which we created during the testing of custom connector in earlier step. You can also initiate a fresh connection by clicking on the drop down under “Connection” as shown below. Refreshing the tools will show all the tools available in this MCP server. Provide the sample prompt such as “Give me the stock price of tesla”. This will trigger the MCP server and call the respective method to bring the stock price of Tesla. Now try a weather-related question to see more. Now invoking weather forecast tool in the MCP server. APIM Monitoring with Log Analytics We previously configured APIM diagnostic settings to forward log data to Log Analytics. In this section, we’ll review that data, as the inbound policy in APIM sends valuable information to Log Analytics. Run the Kusto query to retrieve data from the last 30 minutes. As shown, the logs capture the APIM API endpoint URL and the backend URL, which corresponds to the Azure Container App endpoint. Scrolling further, we find the TraceRecords section. This contains the information captured by APIM inbound policies and sent to Log Analytics. The figure below illustrates the TraceRecords data. In the inbound policy, we configured it to extract details from the access token—such as the token itself, username, scope, and roles—and forward them to Log Analytics. Now let's capture the access token in the clip board, launch the http://jwt.io which is JSON Web Token (JWT) debugger, and paste the access token in the ENCODED VALUE box as show below. Note the following information. aud: This shows the Application URI ID of apim-mcp-backend-api. which shows access token is requested for that audience. appid: This shows the client Id for copilot-studio-client app registration. You can also see roles and scope. These roles are specified in APIM inbound policy. Note: As you can see, roles are included in access token and if it is not assigned in the enterprise application for "apim-mcp-backend-api", all requests will be denied by APIM inbound policy configured earlier. Perform a test using another Azure AD account that does not have the app role assigned Now, let's try the copilot studio agent by logging in with another account which is not assigned for the "mcp.read" role. Let's, review the below diagram. Logged in as demo and tried to access the MCP tool in copilot studio agent. Request failed with the error "Missing required scope or roles". If you look at it, this is coming from the APIM policy configured earlier in <on-error> Let's review log analytics. As you can see request failed due to inbound APIM policy with 403 error and there is no backend URL. Error is also reported under TraceRecords as we configured it in APIM policy. Now copy the Access token from log analytics and paste it into jwt.io. You can notice in the below diagram, there is no "roles" in the access token, resulting access denied from APIM inbound policy definition to the APIM backend i.e. azure container app. Assign the app role to the demo account Let's assign the "mcp.read" role to the demo account and test if it accesses the tool. Visit Azure Portal, Lauch Enterprise application, and select "apim-mcp-backend-api" as in this example. Click "Users and groups" Click "+ Add user/group" Select demo Click "Select" Click "Assign" End result would look like as shown below. Now, login again as demo. Make sure a new access token is generated. Access token refresh happens after one hours. As you can see in the image below, this time the request is successful after assigning the "mcp.read" app roles. Now let's review the log analytics entries. Let's review the access token in JWT.io. As you can see, roles are included in the access token. Conclusion Exposing the MCP server through Azure API Management (APIM) and integrating it with Copilot Studio agents provides a secure and scalable way to extend enterprise capabilities. By implementing OAuth 2.0, you ensure robust authentication and authorization, protecting sensitive data and maintaining compliance with industry standards. Beyond security, APIM adds significant operational value. With APIM policies, you can monitor traffic, enforce rate limits, and apply fine-grained controls to manage access and performance effectively. This combination of security and governance empowers organizations to innovate confidently while maintaining control and visibility over API usage. In today’s enterprise landscape, leveraging APIM with OAuth 2.0 for MCP integration is not just best practice—it’s a strategic move toward building resilient, secure, and well-governed solutions.1.5KViews2likes2CommentsOn‑Device AI with Windows AI Foundry
From “waiting” to “instant”- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isn’t - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt On‑Device AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacy‑first by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, high‑volume tasks. This approach is especially useful in regulated industries, field‑work tools, and any app where users expect quick, on‑device responses. Hybrid Pattern for Real Apps On-device AI doesn’t replace the cloud, it complements it. Here’s how: Standalone On‑Device: Quick, private actions like document summarization, local search, and offline assistants. Cloud‑Enhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isn’t just a trend - it’s a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether you’re creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/NVIDIA NIM for NVIDIA Nemotron, Cosmos, & Microsoft Trellis: Now Available in Azure AI Foundry
We’re excited to announce 7 new powerful NVIDIA NIM™ additions to Azure AI Foundry Models now on Managed Compute. The latest wave of models—NVIDIA Nemotron Nano 9B v2, Llama 3.1 Nemotron Nano VL 8B, Llama 3.3 Nemotron Super 49B v1.5 (coming soon), Cosmos Reason1-7B, Cosmos Predict 2.5 (coming soon), Cosmos Transfer 2.5. (coming soon), and Microsoft Trellis—marks a significant leap forward in intelligent application development. Collectively, these models redefine what’s possible in advanced instruction-following, vision-language understanding, and efficient language modeling, empowering developers to build multimodal, visually rich, and context-aware solutions. By combining robust reasoning, flexible input handling, and enterprise-grade deployment options, these additions accelerate innovation across industries—from robotics and autonomous vehicles to immersive retail and digital twins—enabling smarter, safer, and more adaptive experiences at scale. Meet the Models Model Name Size Primary Use Cases NVIDIA Nemotron Nano 9B v2 Available Now 9B parameters Multilingual Reasoning: Multilingual and code-based reasoning tasks Enterprise Agents: AI and productivity agents Math/Science: Scientific reasoning, advanced math Coding: Software engineering and tool calling Llama 3.3 Nemotron Super 49B v1.5 Coming Soon 49B Enterprise Agents: AI and productivity agents Math/Science: Scientific reasoning, advanced math Coding: Software engineering and tool calling Llama 3.1 Nemotron Nano VL 8B Available Now 8B Multimodal: Multimodal vision-language tasks, document intelligence and understanding Edge Agents: Mobile and edge AI agents Cosmos Reason1-7B Available Now 7B Robotics: Planning and executing tasks with physical constraints. Autonomous Vehicles: Understanding environments and making decisions. Video Analytics Agents: Extracting insights and performing root-cause analysis from video data. Cosmos Predict 2.5 Coming Soon 2B Generalist Model: World state generation and prediction Cosmos Transfer 2.5 Coming Soon 2B Structural Conditioning: Physical AI Microsoft TRELLIS by Microsoft Research Available Now - Digital Twins: Generate accurate 3D assets from simple prompts Immersive Retail experiences: photorealistic product models for AR, virtual try-ons Game and simulation development: Turn creative ideas into production-ready 3D content Meet the NVIDIA Nemotron Family NVIDIA Nemotron Nano 9B v2: Compact power for high-performance reasoning and agentic tasks NVIDIA Nemotron Nano 9B v2 is a high-efficiency large language model built with a hybrid Mamba-Transformer architecture, designed to excel in both reasoning and non-reasoning tasks. Efficient architecture for high-performance reasoning: Combines Mamba-2 and Transformer components to deliver strong reasoning capabilities with higher throughput. Extensive multilingual and code capabilities: Trained on diverse language and programming data, it performs exceptionally well across tasks involving natural language (English, German, French, Italian, Spanish and Japanese), code generation, and complex problem solving. Reasoning Budget Control: Supports runtime “thinking” budget control. During inference, the user can specify how many tokens the model is allowed to "think" for helping balance speed, cost, and accuracy during inference. For example, a user can tell the model to think for “1K tokens or 3K tokens, etc ” for different use cases with far better cost predictability. Fig 1. provided by NVIDIA Nemotron Nano 9B v2 is built from the ground up with training data spanning 15 languages and 43 programming languages, giving it broad multilingual and coding fluency. Its capabilities were sharpened through advanced post-training techniques like GRPO and DPO enabling it to reason deeply, follow instructions precisely, and adapt dynamically to different tasks. -> Explore the model card on Azure AI Foundry Llama 3.3 Nemotron Super 49B v1.5: High-throughput reasoning at scale Llama 3.3 Nemotron Super 49Bv1.5 (coming soon) is a significantly upgraded version of Llama-3.3-Nemotron-Super-49B-v1 and is a large language model which is a derivative of Meta Llama-3.3-70B-Instruct (the reference model) optimized for advanced reasoning, instruction following, and tool use across a wide range of tasks. Excels in applications such as chatbots, AI agents, and retrieval-augmented generation (RAG) systems Balances accuracy and compute efficiency for enterprise-scale workloads Designed to run efficiently on a single NVIDIA H100 GPU, making it practical for real-world applications Llama-3.3-Nemotron-Super-49B-v1.5 was trained through a multi-phase process combining human expertise, synthetic data, and advanced reinforcement learning techniques to refine its reasoning and instruction-following abilities. Its impressive performance across benchmarks like MATH500 (97.4%) and AIME 2024 (87.5%) highlights its strength in tackling complex tasks with precision and depth. Llama 3.1 Nemotron Nano VL 8B: Multimodal intelligence for edge deployments Llama 3.1 Nemotron Nano VL 8B is a compact vision-language model that excels in tasks such as report generation, Q&A, visual understand, and document intelligence. This model delivers low latency and high efficiency, reducing TCO. This model was trained on a diverse mix of human-annotated and synthetic data, enabling robust performance across multimodal tasks such as document understanding and visual question answering. It achieved strong results on evaluation benchmarks including DocVQA (91.2%), ChartQA (86.3%), AI2D (84.8%), and OCRBenchV2 English (60.1%). -> Explore the model card on Azure AI Foundry What Sets Nemotron Apart NVIDIA Nemotron is a family of open models, datasets, recipes, and tools. 1. Open-source AI technologies: Open models, data, and recipes offer transparency, allowing developers to create trustworthy custom AI for their specific needs, from creating new agents to refining existing applications. Open Weights: NVIDIA Open Model License offers enterprises data control and flexible deployment. Open Data: Models are trained with transparent, permissively-licensed NVIDIA data, available on Hugging Face, ensuring confidence in use. Additionally, it allows developers to train their high-accuracy custom models with these open datasets. Open Recipe: NVIDIA shares development techniques, like NAS, hybrid architecture, Minitron, as well as NeMo tools enabling customization or creation of custom models. 2. Highest Accuracy & Efficiency: Engineered for efficiency, Nemotron delivers industry leading accuracy in the least amount of time for reasoning, vision, and agentic tasks. 3. Run Anywhere On Cloud: Packaged as NVIDIA NIM, for secure and reliable deployment of high-performance AI model inferencing across Azure platforms. Meet the Cosmos Family NVIDIA Cosmos™ is a world foundation model (WFM) development platform to advance physical AI. At its core are Cosmos WFMs, openly available pretrained multimodal models that developers can use out-of-the-box for generating world states as videos and physical AI reasoning, or post-train to develop specialized physical AI models. Cosmos Reason1-7B: Physical AI Cosmos Reason1-7B combines chain-of-thought reasoning, flexible input handling for images and video, a compact 7B parameter architecture, and advanced physical world understanding making it ideal for real-time robotics, video analytics, and AI agents that require contextual, step-by-step decision-making in complex environments. This model transforms how AI and robotics interact with the real world giving your systems the power to not just see and describe, but truly understand, reason, and make decisions in complex environments like factories, cities, and autonomous vehicles. With its ability to analyze video, plan robot actions, and verify safety protocols, Cosmos Reason1-7B helps developers build smarter, safer, and more adaptive solutions for real-world challenges. Cosmos Reason1-7B is physical AI for 4 embodiments: Fig.2 Physical AI Model Strengths Physical World Reasoning: Leverages prior knowledge, physics laws, and common sense to understand complex scenarios. Chain-of-Thought (CoT) Reasoning: Delivers contextual, step-by-step analysis for robust decision-making. Flexible Input: Handles images, video (up to 30 seconds, 1080p), and text with a 16k context window. Compact & Deployable: 7B parameters runs efficiently from edge devices to the cloud. Production-Ready: Available via Hugging Face, GitHub, and NVIDIA NIM; integrates with industry-standard APIs. Enterprise Use Cases Cosmos Reason1-7B is more than a model, it’s a catalyst for building intelligent, adaptive solutions that help enterprises shape a safer, more efficient, and truly connected physical world. Fig.3 Use Cases Reimagine safety and efficiency by empowering AI agents to analyze millions of live streams and recorded videos, instantly verifying protocols and detecting risks in factories, cities, and industrial sites. Accelerate robotics innovation with advanced reasoning and planning, enabling robots to understand their environment, make methodical decisions, and perform complex tasks—from autonomous vehicles navigating busy streets to household robots assisting with daily chores. Transform data curation and annotation by automating the selection, labeling, and critiquing of massive, diverse datasets, fueling the next generation of AI with high-quality training data. Unlock smarter video analytics with chain-of-thought reasoning, allowing systems to summarize events, verify actions, and deliver actionable insights for security, compliance, and operational excellence. -> Explore the model card on Azure AI Foundry Also coming soon to Azure AI Foundry are two models of the Cosmos WFM, designed for world generation and data augmentation. Cosmos Predict 2.5 2B Cosmos Predict 2.5 is a next-generation world foundation model that generates realistic, controllable video worlds from text, images, or videos—all through a unified architecture. Trained on 200M+ high-quality clips and enhanced with reinforcement learning, it delivers stronger physics and prompt alignment while cutting compute cost and post-training time for faster Physical AI workflows. Cosmos Transfer 2.5 2B While Predict 2.5 generates worlds, Transfer 2.5 that transforms structured simulation inputs—like segmentation, depth, or LiDAR maps—into photorealistic synthetic data for Physical AI training and development. What Sets Cosmos Apart Built for Physical AI — Purpose-built for robotics, autonomous systems, and embodied agents that understand physics, motion, and spatial environments. Multimodal World Modeling — Combines images, video, depth, segmentation, LiDAR, and trajectories to create physics-aware, controllable world simulations. Scalable Synthetic Data Generation — Generates diverse, photorealistic data at scale using structured simulation inputs for faster Sim2Real training and adaptation. Microsoft Trellis by Microsoft Research: Enterprise-ready 3D Generation Microsoft Trellis by Microsoft Research is a cutting-edge 3D asset generation model developed by Microsoft Research, designed to create high-quality, versatile 3D assets, complete with shapes and textures, from text or image prompts. Seamlessly integrated within the NVIDIA NIM microservice, Trellis accelerates asset generation and empowers creators with flexible, production-ready outputs. Quickly generate high-fidelity 3D models from simple text or image prompts perfect for industries like manufacturing, energy, and smart infrastructure looking to accelerate digital twin creation, predictive maintenance, and immersive training environments. From virtual try-ons in retail to production-ready assets in media, TRELLIS empowers teams to create stunning 3D content at scale, cutting down production time and unlocking new levels of interactivity and personalization. -> Explore the model card on Azure AI Foundry Pricing The pricing breakdown consists of the Azure Compute charges plus a flat fee per GPU for the NVIDIA AI Enterprise license that is required to use the NIM software. Pay-as-you-go (per gpu hour) NIM Surcharge: $1 per gpu hour Azure Compute charges also apply based on deployment configuration Why use Managed Compute? Managed Compute is a deployment option within Azure AI Foundry Models that lets you run large language models (LLMs), SLMs, HuggingFace models and custom models fully hosted on Azure infrastructure. Azure Managed Compute is a powerful deployment option for models not available via standard (pay-go) endpoints. It gives you: Custom model support: Deploy open-source or third-party models Infrastructure flexibility: Choose your own GPU SKUs (NVIDIA A10, A100, H100) Detailed control: Configure inference servers, protocols, and advanced settings Full integration: Works with Azure ML SDK, CLI, Prompt Flow, and REST APIs Enterprise-ready: Supports VNet, private endpoints, quotas, and scaling policies NVIDIA NIM Microservices on Azure These models are available as NVIDIA NIM™ microservices on Azure AI Foundry. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing. NIM microservices are pre-built, containerized AI endpoints that simplify deployment and scale across environments. They allow developers to run models securely and efficiently in the cloud environment. If you're ready to build smarter, more capable AI agents, start exploring Azure AI Foundry. Build Trustworthy AI Solutions Azure AI Foundry delivers managed compute designed for enterprise-grade security, privacy, and governance. Every deployment of NIM microservices through Azure AI Foundry is backed by Microsoft’s Responsible AI principles and Secure Future Initiative ensuring fairness, reliability, and transparency so organizations can confidently build and scale agentic AI workflows. How to Get Started in Azure AI Foundry Explore Azure AI Foundry: Begin by accessing the Azure AI Foundry portal and then following the steps below. Navigate to ai.azure.com. Select on top left existing project that is (Hub) resource provider. If you do not have a HUB Project, create new Hub Project using “+ Create New” link. Choose AI Hub Resource: Deploy with NIM Microservices: Use NVIDIA’s optimized containers for secure, scalable deployment. Select Model Catalog from the left sidebar menu: In the "Collections" filter, select NVIDIA to see all the NIM microservices that are available on Azure AI Foundry. Select the NIM you want to use. Click Deploy. Choose the deployment name and virtual machine (VM) type that you would like to use for your deployment. VM SKUs that are supported for the selected NIM and also specified within the model card will be preselected. Note that this step requires having sufficient quota available in your Azure subscription for the selected VM type. If needed, follow the instructions to request a service quota increase. Use this NVIDIA NeMo Agent Toolkit: designed to orchestrate, monitor, and optimize collaborative AI agents. Note about the License Users are responsible for compliance with the terms of NVIDIA AI Product Agreement . Learn More How to Deploy NVIDIA NIM Docs Learn More about Accelerating agentic workflows with Azure AI Foundry, NVIDIA NIM, and NVIDIA NeMo Agent Toolkit Register for Microsoft Ignite 2025437Views1like0CommentsUnderstanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.The Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.1.5KViews0likes0Comments