DisasterRecovery
2 TopicsBYO Thread Storage in Azure AI Foundry using Python
Build scalable, secure, and persistent multi-agent memory with your own storage backend As AI agents evolve beyond one-off interactions, persistent context becomes a critical architectural requirement. Azure AI Foundry’s latest update introduces a powerful capability — Bring Your Own (BYO) Thread Storage — enabling developers to integrate custom storage solutions for agent threads. This feature empowers enterprises to control how agent memory is stored, retrieved, and governed, aligning with compliance, scalability, and observability goals. What Is “BYO Thread Storage”? In Azure AI Foundry, a thread represents a conversation or task execution context for an AI agent. By default, thread state (messages, actions, results, metadata) is stored in Foundry’s managed storage. With BYO Thread Storage, you can now: Store threads in your own database — Azure Cosmos DB, SQL, Blob, or even a Vector DB. Apply custom retention, encryption, and access policies. Integrate with your existing data and governance frameworks. Enable cross-region disaster recovery (DR) setups seamlessly. This gives enterprises full control of data lifecycle management — a big step toward AI-first operational excellence. Architecture Overview A typical setup involves: Azure AI Foundry Agent Service — Hosts your multi-agent setup. Custom Thread Storage Backend — e.g., Azure Cosmos DB, Azure Table, or PostgreSQL. Thread Adapter — Python class implementing the Foundry storage interface. Disaster Recovery (DR) replication — Optional replication of threads to secondary region. Implementing BYO Thread Storage using Python Prerequisites First, install the necessary Python packages: pip install azure-ai-projects azure-cosmos azure-identity Setting Up the Storage Layer from azure.cosmos import CosmosClient, PartitionKey from azure.identity import DefaultAzureCredential import json from datetime import datetime class ThreadStorageManager: def __init__(self, cosmos_endpoint, database_name, container_name): credential = DefaultAzureCredential() self.client = CosmosClient(cosmos_endpoint, credential=credential) self.database = self.client.get_database_client(database_name) self.container = self.database.get_container_client(container_name) def create_thread(self, user_id, metadata=None): """Create a new conversation thread""" thread_id = f"thread_{user_id}_{datetime.utcnow().timestamp()}" thread_data = { 'id': thread_id, 'user_id': user_id, 'messages': [], 'created_at': datetime.utcnow().isoformat(), 'updated_at': datetime.utcnow().isoformat(), 'metadata': metadata or {} } self.container.create_item(body=thread_data) return thread_id def add_message(self, thread_id, role, content): """Add a message to an existing thread""" thread = self.container.read_item(item=thread_id, partition_key=thread_id) message = { 'role': role, 'content': content, 'timestamp': datetime.utcnow().isoformat() } thread['messages'].append(message) thread['updated_at'] = datetime.utcnow().isoformat() self.container.replace_item(item=thread_id, body=thread) return message def get_thread(self, thread_id): """Retrieve a complete thread""" try: return self.container.read_item(item=thread_id, partition_key=thread_id) except Exception as e: print(f"Thread not found: {e}") return None def get_thread_messages(self, thread_id): """Get all messages from a thread""" thread = self.get_thread(thread_id) return thread['messages'] if thread else [] def delete_thread(self, thread_id): """Delete a thread""" self.container.delete_item(item=thread_id, partition_key=thread_id) Integrating with Azure AI Foundry from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential class ConversationManager: def __init__(self, project_endpoint, storage_manager): self.ai_client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str=project_endpoint ) self.storage = storage_manager def start_conversation(self, user_id, system_prompt): """Initialize a new conversation""" thread_id = self.storage.create_thread( user_id=user_id, metadata={'system_prompt': system_prompt} ) # Add system message self.storage.add_message(thread_id, 'system', system_prompt) return thread_id def send_message(self, thread_id, user_message, model_deployment): """Send a message and get AI response""" # Store user message self.storage.add_message(thread_id, 'user', user_message) # Retrieve conversation history messages = self.storage.get_thread_messages(thread_id) # Call Azure AI with conversation history response = self.ai_client.inference.get_chat_completions( model=model_deployment, messages=[ {"role": msg['role'], "content": msg['content']} for msg in messages ] ) assistant_message = response.choices[0].message.content # Store assistant response self.storage.add_message(thread_id, 'assistant', assistant_message) return assistant_message Usage Example # Initialize storage and conversation manager storage = ThreadStorageManager( cosmos_endpoint="https://your-cosmos-account.documents.azure.com:443/", database_name="conversational-ai", container_name="threads" ) conversation_mgr = ConversationManager( project_endpoint="your-project-connection-string", storage_manager=storage ) # Start a new conversation thread_id = conversation_mgr.start_conversation( user_id="user123", system_prompt="You are a helpful AI assistant." ) # Send messages response1 = conversation_mgr.send_message( thread_id=thread_id, user_message="What is machine learning?", model_deployment="gpt-4" ) print(f"AI: {response1}") response2 = conversation_mgr.send_message( thread_id=thread_id, user_message="Can you give me an example?", model_deployment="gpt-4" ) print(f"AI: {response2}") # Retrieve full conversation history history = storage.get_thread_messages(thread_id) for msg in history: print(f"{msg['role']}: {msg['content']}") Key Highlights: Threads are stored in Cosmos DB under your control. You can attach metadata such as region, owner, or compliance tags. Integrates natively with existing Azure identity and Key Vault. Disaster Recovery & Resilience When coupled with geo-replicated Cosmos DB or Azure Storage RA-GRS, your BYO thread storage becomes resilient by design: Primary writes in East US replicate to Central US. Foundry auto-detects failover and reconnects to secondary region. Threads remain available during outages — ensuring operational continuity. This aligns perfectly with the AI-First Operational Excellence architecture theme, where reliability and observability drive intelligent automation. Best Practices Area Recommendation Security Use Azure Key Vault for credentials & encryption keys. Compliance Configure data residency & retention in your own DB. Observability Log thread CRUD operations to Azure Monitor or Application Insights. Performance Use async I/O and partition keys for large workloads. DR Enable geo-redundant storage & failover tests regularly. When to Use BYO Thread Storage Scenario Why it helps Regulated industries (BFSI, Healthcare, etc.) Maintain data control & audit trails Multi-region agent deployments Support DR and data sovereignty Advanced analytics on conversation data Query threads directly from your DB Enterprise observability Unified monitoring across Foundry + Ops The Future BYO Thread Storage opens doors to advanced use cases — federated agent memory, semantic retrieval over past conversations, and dynamic workload failover across regions. For architects, this feature is a key enabler for secure, scalable, and compliant AI system design. For developers, it means more flexibility, transparency, and integration power. Summary Feature Benefit Custom thread storage Full control over data Python adapter support Easy extensibility Multi-region DR ready Business continuity Azure-native security Enterprise-grade safety Conclusion Implementing BYO thread storage in Azure AI Foundry gives you the flexibility to build AI applications that meet your specific requirements for data governance, performance, and scalability. By taking control of your storage, you can create more robust, compliant, and maintainable AI solutions.20Views0likes0CommentsImplementing Disaster Recovery for Azure App Service Web Applications
Starting March 31, 2025, Microsoft will no longer automatically place Azure App Service web applications in disaster recovery mode in the event of a regional disaster. This change emphasizes the importance of implementing robust disaster recovery (DR) strategies to ensure the continuity and resilience of your web applications. Here’s what you need to know and how you can prepare. Understanding the Change Azure App Service has been a reliable platform for hosting web applications, REST APIs, and mobile backends, offering features like load balancing, autoscaling, and automated management. However, beginning March 31, 2025, in the event of a regional disaster, Azure will not automatically place your web applications in disaster recovery mode. This means that you, as a developer or IT professional, need to proactively implement disaster recovery techniques to safeguard your applications and data. Why This Matters Disasters, whether natural or technical, can strike without warning, potentially causing significant downtime and data loss. By taking control of your disaster recovery strategy, you can minimize the impact of such events on your business operations. Implementing a robust DR plan ensures that your applications remain available and your data remains intact, even in the face of regional outages. Common Disaster Recovery Techniques To prepare for this change, consider the following commonly used disaster recovery techniques: Multi-Region Deployment: Deploy your web applications across multiple Azure regions. This approach ensures that if one region goes down, your application can continue to run in another region. You can use Azure Traffic Manager or Azure Front Door to route traffic to the healthy region. Multi-region load balancing with Traffic Manager and Application Gateway Highly available multi-region web app Regular Backups: Implement regular backups of your application data and configurations. Azure App Service provides built-in backup and restore capabilities that you can schedule to run automatically. Back up an app in App Service How to automatically backup App Service & Function App configurations Active-Active or Active-Passive Configuration: Set up your applications in an active-active or active-passive configuration. In an active-active setup, both regions handle traffic simultaneously, providing high availability. In an active-passive setup, the secondary region remains on standby and takes over only if the primary region fails. About active-active VPN gateways Design highly available gateway connectivity Automated Failover: Use automated failover mechanisms to switch traffic to a secondary region seamlessly. This can be achieved using Azure Site Recovery or custom scripts that detect failures and initiate failover processes. Add Azure Automation runbooks to Site Recovery recovery plans Create and customize recovery plans in Azure Site Recovery Monitoring and Alerts: Implement comprehensive monitoring and alerting to detect issues early and respond promptly. Azure Monitor and Application Insights can help you track the health and performance of your applications. Overview of Azure Monitor alerts Application Insights OpenTelemetry overview Steps to Implement a Disaster Recovery Plan Assess Your Current Setup: Identify all the resources your application depends on, including databases, storage accounts, and networking components. Choose a DR Strategy: Based on your business requirements, choose a suitable disaster recovery strategy (e.g., multi-region deployment, active-active configuration). Configure Backups: Set up regular backups for your application data and configurations. Test Your DR Plan: Regularly test your disaster recovery plan to ensure it works as expected. Simulate failover scenarios to validate that your applications can recover quickly. Document and Train: Document your disaster recovery procedures and train your team to execute them effectively. Conclusion While the upcoming change in Azure App Service’s disaster recovery policy may seem daunting, it also presents an opportunity to enhance the resilience of your web applications. By implementing robust disaster recovery techniques, you can ensure that your applications remain available and your data remains secure, no matter what challenges come your way. Start planning today to stay ahead of the curve and keep your applications running smoothly. Recover from region-wide failure - Azure App Service Reliability in Azure App Service Multi-Region App Service App Approaches for Disaster Recovery Feel free to share your thoughts or ask questions in the comments below. Let's build a resilient future together! 🚀