azure cosmos db
4 TopicsBuilding an AI-Powered ESG Consultant Using Azure AI Services: A Case Study
In today's corporate landscape, Environmental, Social, and Governance (ESG) compliance has become increasingly important for stakeholders. To address the challenges of analyzing vast amounts of ESG data efficiently, a comprehensive AI-powered solution called ESGai has been developed. This blog explores how Azure AI services were leveraged to create a sophisticated ESG consultant for publicly listed companies. https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh The Challenge: Making Sense of Complex ESG Data Organizations face significant challenges when analyzing ESG compliance data. Manual analysis is time-consuming, prone to errors, and difficult to scale. ESGai was designed to address these pain points by creating an AI-powered virtual consultant that provides detailed insights based on publicly available ESG data. Solution Architecture: The Three-Agent System ESGai implements a sophisticated three-agent architecture, all powered by Azure's AI capabilities: Manager Agent: Breaks down complex user queries into manageable sub-questions containing specific keywords that facilitate vector search retrieval. The system prompt includes generalized document headers from the vector database for context. Worker Agent: Processes the sub-questions generated by the Manager, connects to the vector database to retrieve relevant text chunks, and provides answers to the sub-questions. Results are stored in Cosmos DB for later use. Director Agent: Consolidates the answers from the Worker agent into a comprehensive final response tailored specifically to the user's original query. It's important to note that while conceptually there are three agents, the Worker is actually a single agent that gets called multiple times - once for each sub-question generated by the Manager. Current Implementation State The current MVP implementation has several limitations that are planned for expansion: Limited Company Coverage: The vector database currently stores data for only 2 companies, with 3 documents per company (Sustainability Report, XBRL, and BRSR). Single Model Deployment: Only one GPT-4o model is currently deployed to handle all agent functions. Basic Storage Structure: The Blob container has a simple structure with a single directory. While Azure Blob storage doesn't natively support hierarchical folders, the team plans to implement virtual folders in the future. Free Tier Limitations: Due to funding constraints, the AI Search service is using the free tier, which limits vector data storage to 50MB. Simplified Vector Database: The current index stores all 6 files (3 documents × 2 companies) in a single vector database without filtering capabilities or schema definition. Azure Services Powering ESGai The implementation of ESGai leverages multiple Azure services for a robust and scalable architecture: Azure AI Services: Provides pre-built APIs, SDKs, and services that incorporate AI capabilities without requiring extensive machine learning expertise. This includes access to 62 pre-trained models for chat completions through the AI Foundry portal. Azure OpenAI: Hosts the GPT-4o model for generating responses and the Ada embedding model for vectorization. The service combines OpenAI's advanced language models with Azure's security and enterprise features. Azure AI Foundry: Serves as an integrated platform for developing, deploying, and governing generative AI applications. It offers a centralized management centre that consolidates subscription information, connected resources, access privileges, and usage quotas. Azure AI Search (formerly Cognitive Search): Provides both full-text and vector search capabilities using the OpenAI ada-002 embedding model for vectorization. It's configured with hybrid search algorithms (BM25 RRF) for optimal chunk ranking. Azure Storage Services: Utilizes Blob Storage for storing PDFs, Business Responsibility Sustainability Reports (BRSRs), and other essential documents. It integrates seamlessly with AI Search using indexers to track database changes. Cosmos DB: Employs MongoDB APIs within Cosmos DB as a NoSQL database for storing chat history between agents and users. Azure App Services: Hosts the web application using a B3-tier plan optimized for cost efficiency, with GitHub Actions integrated for continuous deployment. Project Evolution: From Concept to Deployment The development of ESGai followed a structured approach through several phases: Phase 1: Data Cleaning Extracted specific KPIs from XML/XBRL datasets and BRSR reports containing ESG data for 1,000 listed companies Cleaned and standardized data to ensure consistency and accuracy Phase 2: RAG Framework Development Implemented Retrieval-Augmented Generation (RAG) to enhance responses by dynamically fetching relevant information Created a workflow that includes query processing, data retrieval, and response generation Phase 3: Initial Deployment Deployed models locally using Docker and n8n automation tools for testing Identified the need for more scalable web services Phase 4: Transition to Azure Services Migrated automation workflows from n8n to Azure AI Foundry services Leveraged Azure's comprehensive suite of AI services, storage solutions, and app hosting capabilities Technical Implementation Details Model Configurations: The GPT model is configured with: Model version: 2024-11-20 Temperature: 0.7 Max Response Token: 800 Past Messages: 10 Top-p: 0.95 Frequency/Presence Penalties: 0 The embedding model uses OpenAI-text-embedding-Ada-002 with 1536 dimensions and hybrid semantic search (BM25 RRF) algorithms. Cost Analysis and Efficiency A detailed cost breakdown per user query reveals: App Server: $390-400 AI Search: $5 per query RAG Query Processing: $4.76 per query Agent-specific costs: Manager: $0.05 (30 input tokens, 210 output tokens) Worker: $3.71 (1500 input tokens, 1500 output tokens) Director: $1.00 (600 input tokens, 600 output tokens) Challenges and Solutions The team faced several challenges during implementation: Quota Limitations: Initial deployments encountered token quota restrictions, which were resolved through Azure support requests (typically granted within 24 hours). Cost Optimization: High costs associated with vectorization required careful monitoring. The team addressed this by shutting down unused services and deploying on services with free tiers. Integration Issues: GitHub Actions raised errors during deployment, which were resolved using GitHub's App Service Build Service. Azure UI Complexity: The team noted that Azure AI service naming conventions were sometimes confusing, as the same name is used for both parent and child resources. Free Tier Constraints: The AI Search service's free tier limitation of 50MB for vector data storage restricts the amount of company information that can be included in the current implementation. Future Roadmap The current implementation is an MVP with several areas for expansion: Expand the database to include more publicly available sustainability reports beyond the current two companies Optimize token usage by refining query handling processes Research alternative embedding models to reduce costs while maintaining accuracy Implement a more structured storage system with virtual folders in Blob storage Upgrade from the free tier of AI Search to support larger data volumes Develop a proper schema for the vector database to enable filtering and more targeted searches Scale to multiple GPT model deployments for improved performance and redundancy Conclusion ESGai demonstrates how advanced AI techniques like Retrieval-Augmented Generation can transform data-intensive domains such as ESG consulting. By leveraging Azure's comprehensive suite of AI services alongside a robust agent-based architecture, this solution provides users with actionable insights while maintaining scalability and cost efficiency. https://youtu.be/5-oBdge6Q78?si=Vb9aHx79xk3VGYAh197Views0likes0CommentsMigration from Cosmos DB to Cosmos DB
A few weeks ago, I was looking for how to migrate, data from a Cosmos DB NoSQL type account to a second Cosmos DB NoSQL account too. On paper it seems at first glance rather simple, but ultimately not so much. So some might ask why? In fact, for one of my critical projects, we initially decided to deploy a Cosmos DB account in Serverless mode, because we had to have users exclusively in Western Europe. But a few months later, the scope of the project radically changed. Now data must be accessible worldwide: Ok, no worries. 1. Potential solution: Geo-replication That's good, the https://learn.microsoft.com/en-us/azure/cosmos-db/introduction?WT.mc_id=AZ-MVP-5005062 service offers a geo-replication feature. The problem is that this feature is not available with Serverless mode, only with Provisioned Throughput mode, which ultimately seems consistent. So I cannot use that way. 2. Potential solution: Data restoration After a few minutes of thinking, I tell myself that it does not matter, just restore the data via the Point In Time Restore (PiTR) option. But I meet a new disappointment, because during the restore, the new Cosmos DB account created, is the same as the initial one, in my case a Serverless account. Ok, for now, I am not lucky. 3. Potential solution: Well I have to look, but why not a migration? So I start my research like https://en.wikipedia.org/wiki/Sherlock_Holmes with my pipe, my magnifying glass and my K-way (sorry I didn't have a raincoat handy). After a few minutes, I come across the https://learn.microsoft.com/en-us/azure/cosmos-db/migration-choices?WT.mc_id=AZ-MVP-5005062 page whose title is Options to migrate your on-premises or cloud data to Azure Cosmos DB Hum, given the title, I might be interested, so I'm starting to take off my K-way because it's really hot. The documentation is quite well done, as often to be honest with Microsoft, it offers different scenarios, and in addition, two types of migration are offered, namely "Online" and/or "Offline. 4. Potential solution: Migration proposed by Microsoft I find many migration use cases there, with as a source, different types of DB such as Azure Cosmos DB of course, but also json or csv files, not to mention Oracle and Apache Cassandra. After a few moments, I list what seems to work for my use case: Offline mode: Using Azure Data Factory Using Azure Cosmos DB Spark connector Using Azure Cosmos DB Spark connector + Change Feed sample Using Custom tool with Azure Cosmos DB bulk executor library Online way: Using a Spark Azure Cosmos DB Connector + Change Feed Using Azure Cosmos DB Functions + ChangeFeed API With my magnifying glass, I look at the various proposed solutions available to me... ... and the more I advance, the more I realize that they require a lot of efforts and for some of them, the deployment of new services is required. Hm, okay ! Before going any further, I go back to my Cosmos DB account to see what it contains. Then I count 1 DB with 3 containers, and in addition, it contains relatively little data. When I weigh the pros and cons of each solutions, I quickly see that it almost takes a gas plant for a relatively simple need. But on the other hand, I have no choice, this migration is mandatory, and as Terence Hill and Bud Spencer said in their movie: Go for It! But there is no urgency, so I'll see if I can find something simpler, and in the worst case, I'll always have a reversal solution with those seen previously. 5. Considered solution: Migration with Azure Cosmos DB data migration tool Continuing my research, I came across an announcement from Microsoft dating from April 2015, talking about the https://azure.microsoft.com/en-us/updates/documentdb-data-migration-tool/?WT.mc_id=AZ-MVP-5005062. Well I recognize that 2015 is far, but I'm going to dig a little so I exchange my pipe, against a small shovel. This open source tool allows to import data to Azure Cosmos DB, from different data sources like: JSON files CSV files SQLServer MongoDB Azure Table storage Amazon DynamoDB HBase Azure Cosmos containers You saw like me, Cosmos DB to Cosmos DB! The pupil of my eyes has started to dilate, my hair (well what I have left of it) has fallen out, and I find myself in my underwear saying: My precious! Once back to my normal appearance, well, my appearance at all, I start to browse the various links mentioned in the announcement and come across the https://github.com/azure/azure-documentdb-datamigrationtool/?WT.mc_id=AZ-MVP-5005062 of the tool. I have the impression that luck has finally changed side, but when I come across the 1st sentence and I read: The Azure Cosmos DB data migration tool is undergoing a full refactor to restructure the project... Ahhhhhhhhhhhhhhh this is driving me crazy, someone is playing with me, there's no other way! But as I'm tenacious, I still decide to visit the https://github.com/Azure/azure-documentdb-datamigrationtool/tree/archive/?WT.mc_id=AZ-MVP-5005062 of the project and end up downloading https://github.com/Azure/azure-documentdb-datamigrationtool/releases/download/1.8.3/azure-documentdb-datamigrationtool-1.8.3.zip which dates from August 2021, which isn't so bad when you think about it. 6. Azure Cosmos DB data migration tool testing I launch the tool via the executable dtui.exe (Yes, I work on Windows, and I'm proud of it ), I go through the doc and the operation seems very simple. There are some prerequisites: A source Azure Cosmos DB account A destination Azure Cosmos DB account Connection strings for each account The name of your databases (DB) The name of your containers As you can see from my example below, my source is aziedb1amo008 and my destination is aziedb1amo900: So I wish to migrate my DB StarWars as well as the various containers, which it has namely People, Planets and Species. What? I told you that it was a critical project Step 1: The first thing to do is therefore to define our source account by specifying the connection string, accompanied by the name of the database at the end of the fields, as well as the collection which is none other than our container. We click on Verify in order to validate that the connection to the account is established correctly. Bingo, we can go to the next step. Step 2: Next, we will define our destination account. As in the previous step, we define the connection string, the name of the DB which will be created automatically if it does not exist, the collection and the partition key. Step 3: If you want, you can define a log file in csv format. Step 4: And finally the last step allows you to have a small summary, and you just have to click on Import. Et voila! Well, not quite because I also wanted to migrate the Planets and Species containers, so I follow the same steps to achieve my goal. After a minute or two, you can therefore see that I find my DB, my containers, and even my data on the new Cosmos DB account, which is quite nice. And of course, it also works with data other than Star Wars, like Pikachu or Marvel! But you can also try with your own dataset5.7KViews2likes0CommentsManual Backup Cosmos DB
Hi, Tried to export data in CosmosDB but it was not successful. According to https://docs.microsoft.com/en-us/azure/cosmos-db/storage-explorer, by using this tool I can export the data inside the cosmosdb, but no option to export. Tried to do the instructions here https://azure.microsoft.com/en-us/updates/documentdb-data-migration-tool/ and https://docs.microsoft.com/en-us/azure/cosmos-db/import-data#JSON, but error is being encountered. Can you help me how to do this in Data Factory or any steps just to manual backup cosmos DB? Thank you.2.4KViews0likes0Comments