azure ai foundry
15 TopicsBYO Thread Storage in Azure AI Foundry using Python
Build scalable, secure, and persistent multi-agent memory with your own storage backend As AI agents evolve beyond one-off interactions, persistent context becomes a critical architectural requirement. Azure AI Foundry’s latest update introduces a powerful capability — Bring Your Own (BYO) Thread Storage — enabling developers to integrate custom storage solutions for agent threads. This feature empowers enterprises to control how agent memory is stored, retrieved, and governed, aligning with compliance, scalability, and observability goals. What Is “BYO Thread Storage”? In Azure AI Foundry, a thread represents a conversation or task execution context for an AI agent. By default, thread state (messages, actions, results, metadata) is stored in Foundry’s managed storage. With BYO Thread Storage, you can now: Store threads in your own database — Azure Cosmos DB, SQL, Blob, or even a Vector DB. Apply custom retention, encryption, and access policies. Integrate with your existing data and governance frameworks. Enable cross-region disaster recovery (DR) setups seamlessly. This gives enterprises full control of data lifecycle management — a big step toward AI-first operational excellence. Architecture Overview A typical setup involves: Azure AI Foundry Agent Service — Hosts your multi-agent setup. Custom Thread Storage Backend — e.g., Azure Cosmos DB, Azure Table, or PostgreSQL. Thread Adapter — Python class implementing the Foundry storage interface. Disaster Recovery (DR) replication — Optional replication of threads to secondary region. Implementing BYO Thread Storage using Python Prerequisites First, install the necessary Python packages: pip install azure-ai-projects azure-cosmos azure-identity Setting Up the Storage Layer from azure.cosmos import CosmosClient, PartitionKey from azure.identity import DefaultAzureCredential import json from datetime import datetime class ThreadStorageManager: def __init__(self, cosmos_endpoint, database_name, container_name): credential = DefaultAzureCredential() self.client = CosmosClient(cosmos_endpoint, credential=credential) self.database = self.client.get_database_client(database_name) self.container = self.database.get_container_client(container_name) def create_thread(self, user_id, metadata=None): """Create a new conversation thread""" thread_id = f"thread_{user_id}_{datetime.utcnow().timestamp()}" thread_data = { 'id': thread_id, 'user_id': user_id, 'messages': [], 'created_at': datetime.utcnow().isoformat(), 'updated_at': datetime.utcnow().isoformat(), 'metadata': metadata or {} } self.container.create_item(body=thread_data) return thread_id def add_message(self, thread_id, role, content): """Add a message to an existing thread""" thread = self.container.read_item(item=thread_id, partition_key=thread_id) message = { 'role': role, 'content': content, 'timestamp': datetime.utcnow().isoformat() } thread['messages'].append(message) thread['updated_at'] = datetime.utcnow().isoformat() self.container.replace_item(item=thread_id, body=thread) return message def get_thread(self, thread_id): """Retrieve a complete thread""" try: return self.container.read_item(item=thread_id, partition_key=thread_id) except Exception as e: print(f"Thread not found: {e}") return None def get_thread_messages(self, thread_id): """Get all messages from a thread""" thread = self.get_thread(thread_id) return thread['messages'] if thread else [] def delete_thread(self, thread_id): """Delete a thread""" self.container.delete_item(item=thread_id, partition_key=thread_id) Integrating with Azure AI Foundry from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential class ConversationManager: def __init__(self, project_endpoint, storage_manager): self.ai_client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str=project_endpoint ) self.storage = storage_manager def start_conversation(self, user_id, system_prompt): """Initialize a new conversation""" thread_id = self.storage.create_thread( user_id=user_id, metadata={'system_prompt': system_prompt} ) # Add system message self.storage.add_message(thread_id, 'system', system_prompt) return thread_id def send_message(self, thread_id, user_message, model_deployment): """Send a message and get AI response""" # Store user message self.storage.add_message(thread_id, 'user', user_message) # Retrieve conversation history messages = self.storage.get_thread_messages(thread_id) # Call Azure AI with conversation history response = self.ai_client.inference.get_chat_completions( model=model_deployment, messages=[ {"role": msg['role'], "content": msg['content']} for msg in messages ] ) assistant_message = response.choices[0].message.content # Store assistant response self.storage.add_message(thread_id, 'assistant', assistant_message) return assistant_message Usage Example # Initialize storage and conversation manager storage = ThreadStorageManager( cosmos_endpoint="https://your-cosmos-account.documents.azure.com:443/", database_name="conversational-ai", container_name="threads" ) conversation_mgr = ConversationManager( project_endpoint="your-project-connection-string", storage_manager=storage ) # Start a new conversation thread_id = conversation_mgr.start_conversation( user_id="user123", system_prompt="You are a helpful AI assistant." ) # Send messages response1 = conversation_mgr.send_message( thread_id=thread_id, user_message="What is machine learning?", model_deployment="gpt-4" ) print(f"AI: {response1}") response2 = conversation_mgr.send_message( thread_id=thread_id, user_message="Can you give me an example?", model_deployment="gpt-4" ) print(f"AI: {response2}") # Retrieve full conversation history history = storage.get_thread_messages(thread_id) for msg in history: print(f"{msg['role']}: {msg['content']}") Key Highlights: Threads are stored in Cosmos DB under your control. You can attach metadata such as region, owner, or compliance tags. Integrates natively with existing Azure identity and Key Vault. Disaster Recovery & Resilience When coupled with geo-replicated Cosmos DB or Azure Storage RA-GRS, your BYO thread storage becomes resilient by design: Primary writes in East US replicate to Central US. Foundry auto-detects failover and reconnects to secondary region. Threads remain available during outages — ensuring operational continuity. This aligns perfectly with the AI-First Operational Excellence architecture theme, where reliability and observability drive intelligent automation. Best Practices Area Recommendation Security Use Azure Key Vault for credentials & encryption keys. Compliance Configure data residency & retention in your own DB. Observability Log thread CRUD operations to Azure Monitor or Application Insights. Performance Use async I/O and partition keys for large workloads. DR Enable geo-redundant storage & failover tests regularly. When to Use BYO Thread Storage Scenario Why it helps Regulated industries (BFSI, Healthcare, etc.) Maintain data control & audit trails Multi-region agent deployments Support DR and data sovereignty Advanced analytics on conversation data Query threads directly from your DB Enterprise observability Unified monitoring across Foundry + Ops The Future BYO Thread Storage opens doors to advanced use cases — federated agent memory, semantic retrieval over past conversations, and dynamic workload failover across regions. For architects, this feature is a key enabler for secure, scalable, and compliant AI system design. For developers, it means more flexibility, transparency, and integration power. Summary Feature Benefit Custom thread storage Full control over data Python adapter support Easy extensibility Multi-region DR ready Business continuity Azure-native security Enterprise-grade safety Conclusion Implementing BYO thread storage in Azure AI Foundry gives you the flexibility to build AI applications that meet your specific requirements for data governance, performance, and scalability. By taking control of your storage, you can create more robust, compliant, and maintainable AI solutions.109Views4likes2CommentsAzure OpenAI Model Upgrades: Prompt Safety Pitfalls with GPT-4o and Beyond
Upgrading to New Azure OpenAI Models? Beware Your Old Prompts Might Break. I recently worked on upgrading our Azure OpenAI integration from gpt-35-turbo to gpt-4o-mini, expecting it to be a straightforward configuration change. Just update the Azure Foundry resource endpoint, change the model name, deploy the code — and voilà, everything should work as before. Right? Not quite. The Unexpected Roadblock As soon as I deployed the updated code, I started seeing 400 status errors from the OpenAI endpoint. The message was cryptic: The response was filtered due to the prompt triggering Azure OpenAI's content management policy. At first, I assumed it was a bug in my SDK call or a malformed payload. But after digging deeper, I realized this wasn’t a technical failure — it was a content safety filter kicking in before the prompt even reached the model. The Prompt That Broke It Here’s the original system prompt that worked perfectly with gpt-35-turbo: YOU ARE A QNA EXTRACTOR IN TEXT FORMAT. YOU WILL GET A SET OF SURVEYJS QNA JSONS. YOU WILL CONVERT THAT INTO A TEXT DOCUMENT. FOR THE QUESTIONS WHERE NO ANSWER WAS GIVEN, MARK THOSE AS NO ANSWER. HERE IS THE QNA: BE CREATIVE AND PROFESSIONAL. I WANT TO GENERATE A DOCUMENT TO BE PUBLISHED. {{$style}} +++++ {{$input}} +++++ This prompt had been reliable for months. But with gpt-4o-mini, it triggered Azure’s new input safety layer, introduced in mid-2024. What Changed with GPT-4o-mini? Unlike gpt-35-turbo, the gpt-4o family: Applies stricter content filtering — not just on the output, but also on the input prompt. Treats system messages and user messages as role-based chat messages, passing them through moderation before the model sees them. Flags prompts that look like prompt injection attempts like aggressive instructions like “YOU ARE…”, “BE CREATIVE”, “GENERATE”, “PROFESSIONAL”. Flags unusual formatting (like `+++++`), artificial delimiters or token markers as it may look like encoded content. In short, the model didn’t even get a chance to process my prompt — it was blocked at the gate. Fixing It: Softening the Prompt The solution wasn’t to rewrite the entire logic, but to soften the system prompt and remove formatting that could be misinterpreted. Here’s what helped: - Replacing “YOU ARE…” with a gentler instruction like “Please help convert the following Q&A data…” - Removing creative directives like “BE CREATIVE” or “PROFESSIONAL” unless clearly contextualized. - Avoiding raw JSON markers and template syntax (`{{ }}`, `+++++`) in the prompt. Once I made these changes, the model responded smoothly — and the upgrade was finally complete. Evolving the Prompt — Not Abandoning It Interestingly, for some prompts I didn’t have to completely eliminate the “YOU ARE…” structure. Instead, I refined it to be more natural and less directive. Here’s a comparison: ❌ Old Prompt (Blocked) ✅ New Prompt (Accepted) YOU ARE A SOURCING AND PROCUREMENT MANAGER. YOU WILL GET BUYER'S REQUIREMENTS IN QNA FORMAT. HERE IS THE QNA: {{$input}} +++++ YOU WILL GENERATE TOP 10 {{$category}} RELATED QUESTIONS THAT CAN BE ASKED OF A SUPPLIER IN JSON FORMAT. THE JSON MUST HAVE QUESTION NUMBER AS THE KEY AND QUESTION TEXT AS THE QUESTION. DON'T ADD ANY DESCRIPTION TEXT OR FORMATTING IN THE OUTPUT. BE CREATIVE AND PROFESSIONAL. I WANT TO GENERATE AN RFX. You are an AI assistant that helps clarify sourcing requirements. You will receive buyer's requirements in QnA format. Here is the QnA: {$input} Your task is to generate the top 10 {$category} related questions that can be asked of a supplier, in JSON format. - The JSON must use the question number as the key and the question text as the value. - Do not include any description text or formatting in the output. - Focus on creating clear, professional, and relevant questions that will help prepare an RFX. Key Takeaways - Model upgrades aren’t just about configuration changes — they can introduce new moderation layers that affect prompt design. - Prompt safety filtering is now a first-class citizen in Azure OpenAI, especially for newer models. - System prompts need to be rewritten with moderation in mind, not just clarity or creativity. This experience reminded me that even small upgrades can surface big learning moments. If you're planning to move to gpt-4o-mini or any newer Azure OpenAI model, take a moment to review your prompts — they might need a little more finesse than before.69Views3likes1CommentConnect AI Agent via postman
I'm having the hardest time trying to connect to my custom agent (Agent_id: asst_g8DVMGAOLiXXk7WmiTCMQBgj) via Postman. I'm able to authenticate fine, and receive the sequre token which I'm able to run my deployment post with no issues (https://aiagentoverview.cognitiveservices.azure.com/openai/deployments/gpt-4.1/chat/completions?api-version=2025-01-01-preview). But how do I run a post to my agent_id: asst_g8DVMGAOLiXXk7WmiTCMQBgj? I cant find any instructions anywhere.33Views0likes2CommentsReasoning Effort for Foundry Agents
I am currently using the Azure AI Foundry Agents API and noticed that unlike the base completions endpoint, there is no option to specify the "Reasoning Effort" parameter. Could you please confirm if this feature is supported in the Agents API? If not yet supported, are there any plans to introduce Reasoning Effort control for the Agents API in future releases?Solved39Views0likes1CommentEstablish an Oracle Database Connection hosted on Azure VM via AI Foundry Agent
I have came across a requirement to create a AI Foundry agent that will accept requests from user like below: a. "I want to connect to abcprd database hosted on subscription sub1, and resource group rg1 and check the AWR report from xAM-yPM on a specific date (eg 21-Oct-2025) b. Check locking session/RMAN backup failures/active sessions from the database abcprd hosted on subscription sub1, and resource group rg1. The agent should be able to fetch the relevant query from knowledge base . connect to the database and run the report for the duration mentioned. It should then fetch the report and pass it to the LLM (GPT 4.1 in our case) for investigations. I am looking for approach to connect to the oracle database based on user's request and execute the query obtained from knowledge base.52Views0likes0CommentsTrigger cant read fabric data agent
I make an agent in Azure AI Foundry. I use fabric data agent as a knowledge. Everything runs well until I try to use trigger to orchestrate my agent. I have added my trigger identity to fabric workspace where my fabric data agent and my lakehouse located. My trigger can work well and there is no error, but my agent cannot respond as if I do a prompt via the playground. Why?I can't delete my Azure Key Vault Connection in Azure AI Foundry
I have deleted all project under my Azure AI Foundry, but I still can't delete the Azure Key Vault Connection. Error: Azure Key Vault connection [Azure Key Vault Name] cannot be deleted, all credentials will be lost. Why is this happening?Issue when connecting from SPFX to Entra-enabled Azure AI Foundry resource
We have been successfully connecting our chat bot from an SPFX to a chat completion model in Azure, using key authentication. We have a requirement now to disable key authentication. This is what we've done so far: disabled API authentication in the resource Gave to the SharePoint Client Extensibility Web Application Principal "Cognitive Services OpenAI User", "Cognitive Service User" and "Cognitive Data Reader" permission in the resource In the SPFX we have added the following in the package-solution.json (and we have approved it in the SharePoint admin site): "webApiPermissionRequests": [ { "resource": "Azure Machine Learning Services", "scope": "user_impersonation" } ] To connect to the chat completion API we're using fetchEventSource from '@microsoft/fetch-event-source', so we're getting a Bearer token using AadTokenProviderFactory from "@microsoft/sp-http", e.g.: // preceeded by some code to get the tokenProvider from aadTokenProviderFactory const token = await tokenProvider.getToken('https://ai.azure.com'); const url = "https://my-ai-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview"; await fetchEventSource(url, { method: 'POST', headers: { Accept: 'text/event-stream', 'Content-type': 'application/json', Authorization: `Bearer ${token}` }, body: body, ...// truncated We added the users (let's say, email address removed for privacy reasons) in the resource as an Azure AI User. When we try to get this to work, we get the following error: The principal `email address removed for privacy reasons` lacks the required data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` to perform `POST /openai/deployments/{deployment-id}/chat/completions` operation. How can we make this work? Ideally we would prefer the SPFX principal to do the request to the chat completion API, without needed to have to add end users in the resource thorugh IAC, but my understanding is that AadTokenProviderFactory only issues delegated access tokens.15Views0likes0CommentsIntroducing Azure AI Models: The Practical, Hands-On Course for Real Azure AI Skills
Hello everyone, Today, I’m excited to share something close to my heart. After watching so many developers, including myself—get lost in a maze of scattered docs and endless tutorials, I knew there had to be a better way to learn Azure AI. So, I decided to build a guide from scratch, with a goal to break things down step by step—making it easy for beginners to get started with Azure, My aim was to remove the guesswork and create a resource where anyone could jump in, follow along, and actually see results without feeling overwhelmed. Introducing Azure AI Models Guide. This is a brand new, solo-built, open-source repo aimed at making Azure AI accessible for everyone—whether you’re just getting started or want to build real, production-ready apps using Microsoft’s latest AI tools. The idea is simple: bring all the essentials into one place. You’ll find clear lessons, hands-on projects, and sample code in Python, JavaScript, C#, and REST—all structured so you can learn step by step, at your own pace. I wanted this to be the resource I wish I’d had when I started: straightforward, practical, and friendly to beginners and pros alike. It’s early days for the project, but I’m excited to see it grow. If you’re curious.. Check out the repo at https://github.com/DrHazemAli/Azure-AI-Models Your feedback—and maybe even your contributions—will help shape where it goes next!Solved784Views1like5CommentsIntroducing AzureImageSDK — A Unified .NET SDK for Azure Image Generation And Captioning
Hello 👋 I'm excited to share something I've been working on — AzureImageSDK — a modern, open-source .NET SDK that brings together Azure AI Foundry's image models (like Stable Image Ultra, Stable Image Core), along with Azure Vision and content moderation APIs and Image Utilities, all in one clean, extensible library. While working with Azure’s image services, I kept hitting the same wall: Each model had its own input structure, parameters, and output format — and there was no unified, async-friendly SDK to handle image generation, visual analysis, and moderation under one roof. So... I built one. AzureImageSDK wraps Azure's powerful image capabilities into a single, async-first C# interface that makes it dead simple to: 🎨 Inferencing Image Models 🧠 Analyze visual content (Image to text) 🚦 Image Utilities — with just a few lines of code. It's fully open-source, designed for extensibility, and ready to support new models the moment they launch. 🔗 GitHub Repo: https://github.com/DrHazemAli/AzureImageSDK Also, I've posted the release announcement on the https://github.com/orgs/azure-ai-foundry/discussions/47 👉🏻 feel free to join the conversation there too. The SDK is available on NuGet too. Would love to hear your thoughts, use cases, or feedback!123Views1like0Comments