azure ai search
135 TopicsBuilding Enterprise Voice-Enabled AI Agents with Azure Voice Live API
The sample application covered in this post demonstrates two approaches in an end-to-end solution that includes product search, order management, automated shipment creation, intelligent analytics, and comprehensive business intelligence through Microsoft Fabric integration. Use Case Scenario: Retail Fashion Agent Core Business Capabilities: Product Discovery and Ordering: Natural language product search across fashion categories (Winter wear, Active wear, etc.) and order placement. REST APIs hosted in Azure Function Apps provide this functionality and a Swagger definition is configured in the Application for tool action. Automated Fulfillment: Integration with Azure Logic Apps for shipment creation in Azure SQL Database Policy Support: Vector-powered QnA for returns, payment issues, and customer policies. Azure AI Search & File Search capabilities are used for this requirement. Conversation Analytics: AI-powered analysis using GPT-4o for sentiment scoring and performance evaluation. The Application captures the entire conversation between the customer and Agent and sends them to an Agent running in Azure Logic Apps to perform call quality assessment, before storing the results in Azure CosmosDB. When during the voice call the customer indicates that the conversation can be concluded, the Agent autonomously sends the conversation history to the Azure Logic App to perform quality assessment. Advanced Analytics Pipeline: Real-time Data Mirroring: Automatic synchronization from Azure Cosmos DB to Microsoft Fabric OneLake Business Intelligence: Custom Data Agents in Fabric for trend analysis and insights Executive Dashboards: Power BI reports for comprehensive performance monitoring Technical Architecture Overview The solution presents two approaches, each optimized for different enterprise scenarios: đŻApproach 1: Direct Model Integration with GPT-Realtime Architecture Components This approach provides direct integration with Azure Voice Live API using GPT-Realtime model for immediate speech-to-speech conversational experiences without intermediate text processing. The Application connects to the Voice Live API uses a Web socket connection. The semantics of this API are similar to the one used when connecting to the GPT-Realtime API directly. The Voice Live API provides additional configurability, like the choice of a custom Voice from Azure Speech Services, options for echo cancellation, noise reduction and plugging an Avatar integration. Core Technical Stack: GPT-Realtime Model: Direct audio-to-audio processing Azure Speech Voice: High-quality TTS synthesis (en-IN-AartiIndicNeural) WebSocket Communication: Real-time bidirectional audio streaming Voice Activity Detection: Server-side VAD for natural conversation flow Client-Side Function Calling: Full control over tool execution logic Key Session Configuration The Direct Model Integration uses the session configuration below: session_config = { "input_audio_sampling_rate": 24000, "instructions": system_instructions, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, }, "tools": tools_list, "tool_choice": "auto", "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "whisper-1"}, } Configuration Highlights: 24kHz Audio Sampling: High-quality audio processing for natural speech Server VAD: Optimized threshold (0.5) with 300ms padding for natural conversation flow Azure Deep Noise Suppression: Advanced noise reduction for clear audio Indic Voice Support: en-IN-AartiIndicNeural for localized customer experience Whisper-1 Transcription: Accurate speech recognition for conversation logging Connecting to the Azure Voice Live API The voicelive_modelclient.py demonstrates advanced WebSocket handling for real-time audio streaming: def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&model={model_name}" f"&agent-access-token={access_token}" ) async def connect(self): if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Function Calling Implementation The Direct Model Integration provides client-side function execution with complete control: tools_list = [ { "type": "function", "name": "perform_search_based_qna", "description": "call this function to respond to the user query on Contoso retail policies, procedures and general QnA", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], }, }, { "type": "function", "name": "create_delivery_order", "description": "call this function to create a delivery order based on order id and destination location", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "destination": {"type": "string"}, }, "required": ["order_id", "destination"], }, }, { "type": "function", "name": "perform_call_log_analysis", "description": "call this function to analyze call log based on input call log conversation text", "parameters": { "type": "object", "properties": { "call_log": {"type": "string"}, }, "required": ["call_log"], }, }, { "type": "function", "name": "search_products_by_category", "description": "call this function to search for products by category", "parameters": { "type": "object", "properties": { "category": {"type": "string"}, }, "required": ["category"], }, }, { "type": "function", "name": "order_products", "description": "call this function to order products by product id and quantity", "parameters": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer"}, }, "required": ["product_id", "quantity"], }, } ] đ¤ Approach 2: Azure AI Foundry Agent Integration Architecture Components This approach leverages existing Azure AI Foundry Service Agents, providing enterprise-grade voice capabilities as a clean wrapper over pre-configured agents. It does not entail any code changes to the Agent itself to voice enable it. Core Technical Stack: Azure Fast Transcript: Advanced multi-language speech-to-text processing Azure AI Foundry Agent: Pre-configured Agent with autonomous capabilities GPT-4o-mini Model: Agent-configured model for text processing Neural Voice Synthesis: Indic language optimized TTS Semantic VAD: Azure semantic voice activity detection Session Configuration The Agent Integration approach uses advanced semantic voice activity detection: session_config = { "input_audio_sampling_rate": 24000, "turn_detection": { "type": "azure_semantic_vad", "threshold": 0.3, "prefix_padding_ms": 200, "silence_duration_ms": 200, "remove_filler_words": False, "end_of_utterance_detection": { "model": "semantic_detection_v1", "threshold": 0.01, "timeout": 2, }, }, "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "azure-speech", "language": "en-IN, hi-IN"}, } Key Differentiators: Semantic VAD: Intelligent voice activity detection with utterance prediction Multi-language Support: Azure Speech with en-IN and hi-IN language support End-of-Utterance Detection: AI-powered conversation turn management Filler Word Handling: Configurable processing of conversational fillers Agent Integration Code The voicelive_client.py demonstrates seamless integration with Azure AI Foundry Agents. Notice that we need to provide the Azure AI Foundry Project Name and an ID of the Agent in it. We do not need to pass the model's name here, since the Agent is already configured with one. def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&agent-project-name={project_name}&agent-id={agent_id}" f"&agent-access-token={access_token}" ) async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Advanced Analytics Pipeline GPT-4o Powered Call Analysis The solution implements conversation analytics using Azure Logic Apps with GPT-4o: { "functions": [ { "name": "evaluate_call_log", "description": "Evaluate call log for Contoso Retail customer service call", "parameters": { "properties": { "call_reason": { "description": "Categorized call reason from 50+ predefined scenarios", "type": "string" }, "customer_satisfaction": { "description": "Overall satisfaction assessment", "type": "string" }, "customer_sentiment": { "description": "Emotional tone analysis", "type": "string" }, "call_rating": { "description": "Numerical rating (1-5 scale)", "type": "number" }, "call_rating_justification": { "description": "Detailed reasoning for rating", "type": "string" } } } } ] } Microsoft Fabric Integration The analytics pipeline extends into Microsoft Fabric for enterprise business intelligence: Fabric Integration Features: Real-time Data Mirroring: Cosmos DB to OneLake synchronization Custom Data Agents: Business-specific analytics agents in Fabric Copilot Integration: Natural language business intelligence queries Power BI Dashboards: Interactive reports and executive summaries Artefacts for reference The source code of the solution is available in the GitHub Repo here. An article on this topic is published on LinkedIn here A video recording of the demonstration of this App is available below: Part1 - walkthrough of the Agent configuration in Azure AI Foundry - here Part2 - demonstration of the Application that integrates with the Azure Voice Live API - here Part 3 - demonstration of the Microsoft Fabric Integration, Data Agents, Copilot in Fabric and Power BI for insights and analysis - here Conclusion Azure Voice Live API enables enterprises to build sophisticated voice-enabled AI assistants using two distinct architectural approaches. The Direct Model Integration provides ultra-low latency for real-time applications, while the Azure AI Foundry Agent Integration offers enterprise-grade governance and autonomous operation. Both approaches deliver the same comprehensive business capabilities: Natural voice interactions with advanced VAD and noise suppression Complete retail workflow automation from inquiry to fulfillment AI-powered conversation analytics with sentiment scoring Enterprise business intelligence through Microsoft Fabric integration The choice between approaches depends on your specific requirements: Choose Direct Model Integration for custom function calling and minimal latency Choose Azure AI Foundry Agent Integration for enterprise governance and existing investments311Views1like0CommentsData Storage in Azure OpenAI Service
Data Stored at Rest by Default Azure OpenAI does store certain data at rest by default when you use specific features (continue reading) In general, the base models are stateless and do not retain your prompts or completions from standard API calls (they aren't used to train or improve the base models)â. However, some optional service features will persist data in your Azure OpenAI resource. For example, if you upload files for fine-tuning, use the vector store, or enable stateful features like Assistants API Threads or Stored Completions, that data will be stored at rest by the serviceâ. This means content such as training datasets, embeddings, conversation history, or output logs from those features are saved within your Azure environment. Importantly, this storage is within your own Azure tenant (in the Azure OpenAI resource you created) and remains in the same geographic region as your resourceâ. In summary, yes â data can be stored at rest by default when using these features, and it stays isolated to your Azure resource in your tenant. If you only use basic completions without these features, then your prompts and outputs are not persisted in the resource by default (aside from transient processing). Location and Deletion of Stored Data Location: All data stored by Azure OpenAI features resides in your Azure OpenAI resourceâs storage, within your Azure subscription/tenant and in the same region (geography) that your resource is deployedâ. Microsoft ensures this data is secured â it is automatically encrypted at rest using AES-256 encryption, and you have the option to add a customer-managed key for double encryption (except in certain preview features that may not support CMK)â. No other Azure OpenAI customers or OpenAI (the company) can access this data; it remains isolated to your environmentâ. Deletion: You retain full control over any data stored by these features. The official documentation states that stored data can be deleted by the customer at any time. For instance, if you fine-tune a model, the resulting custom model and any training files you uploaded are exclusively available to you and you can delete them whenever you wishâ. Similarly, any stored conversation threads or batch processing data can be removed by you through the Azure portal or API. In short, data persisted for Azure OpenAI features is user-managed: it lives in your tenant and you can delete it on demand once itâs no longer needed. Comparison to Abuse Monitoring and Content Filtering Itâs important to distinguish the above data storage from Azure OpenAIâs content safety system (content filtering and abuse monitoring), which operates differently: Content Filtering: Azure OpenAI automatically checks prompts and generations for policy violations. These filters run in real-time and do not store your prompts or outputs in the filter models, nor are your prompts/outputs used to improve the filters without consentâ. In other words, the content filtering process itself is ephemeral â it analyzes the content on the fly and doesnât permanently retain that data. Abuse Monitoring: By default (if enabled), Azure OpenAI has an abuse detection system that might log certain data when misuse is detected. If the systemâs algorithms flag potential violations, a sample of your prompts and completions may be captured for reviewâ. Any such data selected for human review is stored in a secure, isolated data store tied to your resource and region (within the Azure OpenAI service boundaries in your geography)â. This is used strictly for moderation purposes â e.g. a Microsoft reviewer could examine a flagged request to ensure compliance with the Azure OpenAI Code of Conduct. When Abuse Monitoring is Disabled: if you disabled content logging/abuse monitoring (via an approved Microsoft process to turn it off). According to Microsoftâs documentation, when a customer has this modified abuse monitoring in place, Microsoft does not store any prompts or completions for that subscriptionâs Azure OpenAI usageâ. The human review process is completely bypassed (because thereâs no stored data to review). Only the AI-based checks might still occur, but they happen in-memory at request time and do not persist your data at restâ. Essentially, with abuse monitoring turned off, no usage data is being saved for moderation purposes; the system will check content policy compliance on the fly and then immediately discard those prompts/outputs without logging them. Data Storage and Deletion in Azure OpenAI âChat on Your Dataâ Azure OpenAIâs âChat on your dataâ (also called Azure OpenAI on your data, part of the Assistants preview) lets you ground the modelâs answers on your own documents. It stores some of your data to enable this functionality. Below, we explain where and how your data is stored, how to delete it, and important considerations (based on official Microsoft documentation). How Azure Open AI on your data stores your data Data Ingestion and Storage: When you add your own data (for example by uploading files or providing a URL) through Azure OpenAIâs âAdd your dataâ feature, the service ingests that content into an Azure Cognitive Search index (Azure AI Search). The data is first stored in Azure Blob Storage (for processing) and then indexed for retrieval: Files Upload (Preview): Files you upload are stored in an Azure Blob Storage account and then ingested (indexed) into an Azure AI Search index. This means the text from your documents is chunked and saved in a search index so the model can retrieve it during chat. Web URLs (Preview): If you add a website URL as a data source, the page content is fetched and saved to a Blob Storage container (webpage-<index name>), then indexed into Azure Cognitive Searchâ. Each URL you add creates a separate container in Blob storage with the page content, which is then added to the search index. Existing Azure Data Stores: You also have the option to connect an existing Azure Cognitive Search index or other vector databases (like Cosmos DB or Elasticsearch) instead of uploading new files. In those cases, the data remains in that source (for example, your existing search index or database), and Azure OpenAI will use it for retrieval rather than copying it elsewhere. Chat Sessions and Threads: Azure OpenAIâs Assistants feature (which underpins âChat on your dataâ) is stateful. This means it retains conversation history and any file attachments you use during the chat. Specifically, it stores: (1) Threads, messages, and runs from your chat sessions, and (2) any files you uploaded as part of an Assistantâs setup or messagesâ. All this data is stored in a secure, Microsoft-managed storage account, isolated for your Azure OpenAI resourceâ. In other words, Azure manages the storage for conversation history and uploaded content, and keeps it logically separated per customer/resource. Location and Retention: The stored data (index content, files, chat threads) resides within the same Azure region/tenant as your Azure OpenAI resource. It will persist indefinitely â Azure OpenAI will not automatically purge or delete your data â until you take action to remove itâ. Even if you close your browser or end a session, the ingested data (search index, stored files, thread history) remains saved on the Azure side. For example, if you created a Cognitive Search index or attached a storage account for âChat on your data,â that index and the files stay in place; the system does not delete them in the backgroundâ. How to Delete Stored Data Removing data that was stored by the âChat on your dataâ feature involves a manual deletion step. You have a few options depending on what data you want to delete: Delete Chat Threads (Assistants API): If you used the Assistants feature and have saved conversation threads that you want to remove (including their history and any associated uploaded files), you can call the Assistants API to delete those threads. Azure OpenAI provides a DELETE endpoint for threads. Using the threadâs ID, you can issue a delete request to wipe that threadâs messages and any data tied to itâ. In practice, this means using the Azure OpenAI REST API or SDK with the thread ID. For example: DELETE https://<your-resource-name>.openai.azure.com/openai/threads/{thread_id}?api-version=2024-08-01-preview . This âdelete threadâ operation will remove the conversation and its stored content from the Azure OpenAI Assistants storage (Simply clearing or resetting the chat in the Studio UI does not delete the underlying thread data â you must call the delete operation explicitlyâ.) Delete Your Search Index or Data Source: If you connected an Azure Cognitive Search index or the system created one for you during data ingestion, you should delete the index (or wipe its documents) to remove your content. You can do this via the Azure portal or Azure Cognitive Search APIs: go to your Azure Cognitive Search resource, find the index that was created to store your data, and delete that index. Deleting the index ensures all chunks of your documents are removed from search. Similarly, if you had set up an external vector database (Cosmos DB, Elasticsearch, etc.) as the data source, you should delete any entries or indexes there to purge the data. Tip: The index name you created is shown in the Azure AI Studio and can be found in your search resourceâs overviewââ. Removing that index or the entire search resource will delete the ingested data. Delete Stored Files in Blob Storage: If your usage involved uploading files or crawling URLs (thereby storing files in a Blob Storage container), youâll want to delete those blobs as well. Navigate to the Azure Blob Storage account/container that was used for âChat on your dataâ and delete the uploaded files or containers containing your data. For example, if you used the âUpload files (preview)â option, the files were stored in a container in the Azure Storage account you providedââ you can delete those directly from the storage account. Likewise, for any web pages saved under webpage-<index name> containersâ, delete those containers or blobs via the Storage account in Azure Portal or using Azure Storage Explorer. Full Resource Deletion (optional): As an alternative cleanup method, you can delete the Azure resources or resource group that contain the data. For instance, if you created a dedicated Azure Cognitive Search service or storage account just for this feature, deleting those resources (or the whole resource group they reside in) will remove all stored data and associated indices in one goâ. Note: Only use this approach if youâre sure those resources arenât needed for anything else, as it is a broad action. Otherwise, stick to deleting the specific index or files as described above. Verification: Once you have deleted the above, the model will no longer have access to your data. The next time you use âChat on your data,â it will not find any of the deleted content in the index, and thus cannot include it in answers. (Each query fetches data fresh from the connected index or vector store, so if the data is gone, nothing will be retrieved from it.) Considerations and Limitations No Automatic Deletion: Remember that Azure OpenAI will not auto-delete any data youâve ingested. All data persists until you remove itâ. For example, if you remove a data source from the Studio UI or end your session, the configuration UI might forget it, but the actual index and files remain stored in your Azure resourcesâ. Always explicitly delete indexes, files, or threads to truly remove the data. Preview Feature Caveats: âChat on your dataâ (Azure OpenAI on your data) is currently a preview feature. Some management capabilities are still evolving. A known limitation was that the Azure AI Studio UI did not persist the data source connection between sessions â youâd have to reattach your index each time, even though the index itself continued to existâ. This is being worked on, but it underscores that the UI might not show you all lingering data. Deleting via API/portal is the reliable way to ensure data is removed. Also, preview features might not support certain options like customer-managed keys for encryption of the stored dataâ(the data is still encrypted at rest by Microsoft, but you may not be able to bring your own key in preview). Data Location & Isolation: All data stored by this feature stays within your Azure OpenAI resourceâs region/geo and is isolated to your tenantâ. It is not shared with other customers or OpenAI â it remains private to your resource. So, deleting it is solely your responsibility and under your control. Microsoft confirms that the Assistants data storage adheres to compliance like GDPR and CCPA, meaning you have the ability to delete personal data to meet compliance requirementsâ Costs: There is no extra charge specifically for the Assistant âon your dataâ storage itselfâ. The data being stored in a cognitive search index or blob storage will simply incur the normal Azure charges for those services (for example, Azure Cognitive Search indexing queries, or storage capacity usage). Deleting unused resources when youâre done is wise to avoid ongoing charges. If you only delete the data (index/documents) but keep the search service running, you may still incur minimal costs for the service being available â consider deleting the whole search resource if you no longer need itâ Residual References: After deletion, any chat sessions or assistants that were using that data source will no longer find it. If you had an Assistant configured with a now-deleted vector store or index, you might need to update or recreate the assistant if you plan to use it again, as the old data source wonât resolve. Clearing out the data ensures itâs gone from future responses. (Each new question to the model will only retrieve from whatever data sources currently exist/are connected.) In summary, the data you intentionally provide for Azure OpenAIâs features (fine-tuning files, vector data, chat histories, etc.) is stored at rest by design in your Azure OpenAI resource (within your tenant and region), and you can delete it at any timeâ. This is separate from the content safety mechanisms. Content filtering doesnât retain data, and abuse monitoring would ordinarily store some flagged data for review â but since you have that disabled, no prompt or completion data is being stored for abuse monitoring nowâ. All of these details are based on Microsoftâs official documentation, ensuring your understanding is aligned with Azure OpenAIâs data privacy guarantees and settings. Azure OpenAI âChat on your dataâ stores your content in Azure Search indexes and blob storage (within your own Azure environment or a managed store tied to your resource). This data remains until you take action to delete it. To remove your data, delete the chat threads (via API) and remove any associated indexes or files in Azure. There are no hidden copies once you do this â the system will not retain context from deleted data on the next chat run. Always double-check the relevant Azure resources (search and storage) to ensure all parts of your data are cleaned up. Following these steps, you can confidently use the feature while maintaining control over your data lifecycle.4KViews1like1CommentEnabling SharePoint RAG with LogicApps Workflows
SharePoint Online is quite popular for storing organizational documents. Many organizations use it due to its robust features for document management, collaboration, and integration with other Microsoft 365 services. SharePoint Online provides a secure, centralized location for storing documents, making it easier for everyone from organization to access and collaborate on files from the device of their choice. Retrieve-Augment-Generate (RAG) is a process used to infuse the large language model with organizational knowledge without explicitly fine tuning it which is a laborious process. RAG enhances the capabilities of language models by integrating them with external data sources, such as SharePoint documents. In this approach, documents stored in SharePoint are first converted into smaller text chunks and vector embeddings of the chunks, then saved into index store such as Azure AI Search. Embeddings are numerical representations capturing the semantic properties of the text. When a user submits a text query, the system retrieves relevant document chunks from the index based on best matching text and text embeddings. These retrieved document chunks are then used to augment the query, providing additional context and information to the large language model. Finally, the augmented query is processed by the language model to generate a more accurate and contextually relevant response. Azure AI Search provides a built-in connector for SharePoint Online, enabling document ingestion via a pull approach, currently in public preview. This blog post outlines a LogicApps workflow-based method to export documents, along with associated ACLs and metadata, from SharePoint to Azure Storage. Once in Azure Storage, these documents can be indexed using the Azure AI Search indexer. At a high level, two workflow groups (historic and ongoing) are created, but only one should be active at a time. The historic flow manages the export of all documents from SharePoint Online to initially populate the Azure AI Search index from Azure Storage where documents are exported to. This flow processes documents from a specified start date to the current date, incrementally considering documents created within a configurable time window before moving to the next time slice. The sliding time window approach ensures compliance with SharePoint throttling limits by preventing the export of all documents at once. This method enables a gradual and controlled document export process by targeting documents created in a specific time window. Once the historical document export is complete, the ongoing export workflow should be activated (historic flow should be deactivated). This workflow exports documents from the timestamp when the historical export concluded up to the current date and time. The ongoing export workflow also accounts for documents created or modified since the last load and handles scenarios where documents are renamed at the source. Both workflows save the last exported timestamp in Azure Storage and use it as a starting point for every run. Historic document export flow Parent flow Recurs at every N hours. This is a configurable value. Usually export of historic documents requires many runs depending upon the total count of documents which could range from thousands to millions. Sets initial values for the sliding window variables - from_date_time_UTC, to_date_time_UTC from_date_time_UTC is read from the blob-history.txt file The to_date_time_UTC is set to from_date_time_UTC plus the increment days. If this increment results in a date greater than the current datetime, to_date_time_UTC is set to the current datetime Get the list of all SharePoint lists and Libraries using the built-in action Initialize the additional variables - files_to_process, files_to_process_temp, files_to_process_chunks Later, these variables facilitate the grouping of documents into smaller lists, with each group being passed to the child flow to enable scaling with parallel execution Loop through list of SharePoint Document libraries and lists Focus only on Document library, ignore SharePoint list (Handle SharePoint list processing only if your specific use case requires it) Get the files within the document library and file properties where file creation timestamp falls between from_date_time_UTC and to_date_time_UTC Created JSON to capture the Document library name and id (this will be required in the child flow to export a document) Use Javascript to only retain the documents and ignore folders. The files and their properties also have folders as a separate item which we do not require. Append the list of files to the variable Use the built-in chunk function to create list of lists, each containing the document as an item Invoke child workflow and pass each sub-list of files Wait for all child flows to finish successfully and then write the to_date_time_UTC to the blob-history.txt file Child flow Loop through each item which is document metadata received from the parent flow Get the content of file and save into Azure Storage Run SharePoint /roleassignments API to get the ACL (Access Control List) information, basically the users and groups that have access to the document Run Javascript to keep roles of interest Save the filtered ACL into Azure Storage Save the document metadata which is document title, created / modified timestamps, creator, etc. into Azure Storage All the information is saved into Azure Storage which offers flexibility to leverage the parts based on use case requirements All document metadata is also saved into an Azure SQL Database table for the purpose of determining if the file being processed was modified (exists in the database table) or renamed (file names do not match) Return Status 200 indicating the child flow has successfully completed Ongoing data export flow Parent flow The ongoing parent flow is very similar to the historic flow, itâs just that Get the files within the document library action gets the files that have creation timestamp or modified timestamp between from_date_time_UTC and to_date_time_UTC. This change allows to handle files that get created or modified in SharePoint after last run of the ongoing workflow. Note: Remember, you need to disable the historic flow after all history load has been completed. The ongoing flow can be enabled after the historic flow is disabled. Child flow The ongoing child flow also follows similar pattern of the historic child flow. Notable differences are â Handling of document rename at source which deletes the previously exported file / metadata / ACL from Azure Storage and recreates these artefacts with new file name. Return Status 200 indicating the child flow has successfully completed Both flows have been divided into parent-child flows, enabling the export process to scale by running multiple document exports simultaneously. To manage or scale this process, adjust the concurrency settings within LogicApps actions and the App scale-out settings under the LogicApps service. These adjustments help ensure compliance with SharePoint throttling limits. The presented solution works with single site out of the box and can be updated to work with a list of sites. Workflow parameters Parameter Name Type Example Value sharepoint_site_address String https://XXXXX.sharepoint.com/teams/test-sp-site blob_container_name String sharepoint-export blob_container_name_acl String sharepoint-acl blob_container_name_metadata String sharepoint-metadata blob_load_history_container_name String load-history blob_load_history_file_name String blob-history.txt file_group_count Int 40 increment_by_days int 7 The workflows can be imported into from GitHub repository below. Github repo: SharePoint-to-Azure-Storage-for-AI-Search LogicApps workflows3.7KViews0likes3CommentsBest Practices for Using Azure AI Search for Natural Language to SQL Generation with Generative AI
Introduction Using Generative AI to convert natural language (NL) into SQL queries can simplify user interactions with complex databases. This technology can democratize data access by allowing non-technical business partners to obtain insights without needing to write SQL queries. It can streamline analysts' workflows by enabling them to focus on data interpretation and strategy rather than query formulation. Additionally, it enhances productivity by reducing the time and effort required to retrieve data and ensures more consistent and accurate query results through automated translation of natural language into SQL. However, schema complexity, schema storage and retrieval, and contextual understanding are often the challenges. Azure AI Search, paired with generative AI models like GPT, can tackle these issues by enabling efficient indexing, storage, and retrieval mechanisms, while providing the right context to the AI model for accurate SQL generation. This combination ensures that users can easily query complex databases and get precise answers, enhancing both usability and reliability. Understanding the Challenges When building NL to SQL solutions, here are the key issues to address: Schema Complexity: Databases have intricate schemas that can make NL to SQL translation difficult. Schema Storage & Planning: Efficiently storing schema details for quick access by the AI model. Contextual Retrieval: The AI model requires an understanding of schema relationships to generate accurate queries. Ranking and Optimization: Retrieving the most relevant schema details and prioritizing them for accuracy. Natural Language Ambiguity: Human language is inherently ambiguous and context-dependent. Disambiguating user queries and understanding the intended meaning is necessary to generate accurate SQL statements. Dynamic Schemas: Adapting to evolving database schemas without much challenge is crucial. Best Practices for AI Search Indexing and Storing 1. Plan the Index Structure Based on Schema Elements - What to Index: Table Names: Index all table names in the schema. Column Names: Include column names with metadata (e.g., primary key, foreign key). Data Types: Store column data types to help frame conditions. Relationships: Capture foreign key relationships to support joins. Sample Values: Store sample values or data patterns to provide context. Code Example: { "index": "database_schema", "fields": [ { "name": "table_name", "type": "Edm.String", "searchable": true }, { "name": "column_name", "type": "Edm.String", "searchable": true }, { "name": "data_type", "type": "Edm.String", "searchable": false }, { "name": "column_description", "type": "Edm.String", "searchable": true }, { "name": "table_relationships", "type": "Collection(Edm.String)" } ] } 2. Use Semantic Search to Enhance Query Understanding Feature: Semantic Search Best Practice: Enable semantic search to allow the AI model to understand the meaning behind user queries, even if terminology doesnât match the schema. For example, "total sales" can match "Sales Amount" or "Revenue." Code Example in python search_results = search_client.search( search_text="list total sales", semantic_configuration_name="default" ) for result in search_results: print(result["table_name"], result["column_name"]) Use Vector Indexing for Schema Embeddings Feature: Vector Search Best Practice: Convert schema descriptions and relationships into vector embeddings and store them in Azure AI Search. This allows semantic matching for terms that donât directly align with schema elements. Code Example: from azure.search.documents import SearchClient query_vector = generate_embedding("list all clients from New York") search_results = search_client.search( search_text=None, vectors={"vector_embedding": query_vector}, top=5 ) for result in search_results: print(f"Table: {result['table_name']}, Column: {result['column_name']}") 4. Enrich Index with Metadata and Descriptions What to Store: Column Descriptions: Describe each column's purpose. Relationships Metadata: Include primary and foreign key relationships. AI-Generated Metadata: Use AI enrichment to auto-generate metadata, enhancing SQL generation accuracy. Why It Helps: Storing metadata helps the AI model understand schema relationships and context. Code Example: { "name": "database_schema_index", "fields": [ {"name": "table_name", "type": "Edm.String", "searchable": true}, {"name": "column_name", "type": "Edm.String", "searchable": true}, {"name": "description", "type": "Edm.String", "searchable": true}, {"name": "vector_embedding", "type": "Collection(Edm.Single)", "vectorSearch": true} ] } 5.Prioritize Key Schema Elements Using Custom Scoring Profiles Feature: Custom Scoring Profiles Best Practice: Create custom scoring profiles to prioritize schema elements based on usage frequency or role. This allows AI models to focus on important details. Code Example: { "scoringProfiles": [ { "name": "importanceScoring", "text": { "weights": { "column_name": 1.5, "table_relationships": 2.0 } } } ] } 6. Use Filters and Facets for Contextual Retrieval Feature: Filters and Facets Best Practice: Define filters to narrow schema retrieval based on context. For example, when a query is related to "sales," limit results to sales tables. Use facets to categorize and narrow schema components. Code Example: search_results = search_client.search( search_text="sales by region", filter="table_name eq 'SalesData' or column_name eq 'Region'", facets=["table_name"] ) for result in search_results: print(result["table_name"], result["column_name"]) 7. Store Synonyms and Related Terms to Enhance Retrieval Feature: Synonym Maps Best Practice: Use synonym maps to link alternative terms (e.g., "revenue" and "sales") for more accurate matching with schema components. Code Example: from azure.search.documents.indexes.models import SynonymMap synonym_map = SynonymMap( name="synonymMap", synonyms=["revenue, sales", "client, customer", "product, item"] ) search_client.create_synonym_map(synonym_map) Integrating with Generative AI Models With retrieved schema details, integrate the context with generative AI for SQL generation. Prompt Engineering: Provide schema details and example queries to give context to the AI model. Use structured prompts to define schema relationships. Code Example: prompt = f""" Database Schema: {retrieved_schema_details} Query: {user_natural_language_query} Generate an optimized SQL query based on the schema details. """ Incorporate Few-Shot Prompting In this approach, the LLM is provided with a few examples (prompts), and it learns to generalize from these to unseen tasks. The idea behind few-shot learning is to mimic human cognitive abilities. Just like humans, who often learn a new task from a few examples, AI models should be able to do the same. In the context of NL2SQL solutions, we can provide the AI model with a few examples of natural language queries and their corresponding SQL statements. Here's how few-shot learning can be implemented in the prompt engineering phase: Code Example : prompt = f""" Database Schema: {retrieved_schema_details} Example 1: Query: 'Show the total sales by region for last year.' SQL: 'SELECT Region, SUM(Sales) FROM SalesData WHERE Year=2020 GROUP BY Region' Example 2: Query: 'List all products sold in New York in 2020.' SQL: 'SELECT DISTINCT Product FROM SalesData WHERE Location='New York' AND Year=2020' Introduce a multi-step self-correction loop The multi-step component allows the LLM to correct the generated SQL query for accuracy. In this approach, the previous prompt and generated code are given as input with the next prompt to generate the next code sequence. This way the generated SQL is checked for syntax errors, and this feedback is further used to enrich our prompt for the LLM for more accurate and effective corrections in the generated SQL. This enables the generative AI model to self-correct its output, ensuring that the resulting SQL query is both syntactically and contextually accurate. For example, suppose we input a natural language query like 'Show the total sales by region for last year.' The AI model generates an SQL query, which is then checked for syntax errors. If an error is found, the model uses this feedback to enrich the prompt and correct the SQL output. This process can continue across multiple steps until the outputted SQL query is correct, reflecting the power and efficiency of a multi-step self-correction loop. Example 1: Suppose your natural language query is "Show me the total sales by region for 2020". Let's assume the initial SQL generated by the AI is incorrect. Initial SQL: "SELECT Region, SUM(Sales) FROM SalesData WHERE Year='2020'" After checking the syntax, the SQL is found to be incorrect. We can then re-prompt the AI model: Prompt: "Based on the database schema, the correct SQL query should be structured differently. Correct the SQL query for 'Show me the total sales by region for 2020'." Corrected SQL: "SELECT Region, SUM(Sales) FROM SalesData WHERE Year=2020 GROUP BY Region" Example 2: For the query "List all products sold in New York in 2020", the initial SQL generated is: Initial SQL: "SELECT Product FROM SalesData WHERE Location='New York' AND Year='2020'" After syntax checking, we find that the SQL is incorrect. So, we re-prompt the AI model: Prompt: "Based on the database schema, the correct SQL query should be structured differently. Correct the SQL query for 'List all products sold in New York in 2020'." Corrected SQL: "SELECT DISTINCT Product FROM SalesData WHERE Location='New York' AND Year=2020" Conclusion Using Azure AI Search with Generative AI for NL2SQL solutions streamlines the translation from natural language to SQL by managing schema details and prioritizing relevant context. Leveraging features like vector indexing, semantic search, and custom scoring helps in providing accurate and efficient SQL query generation. A holistic strategy for NL2SQL solutions involves understanding and indexing the database schema using Azure AI Search, enhancing query comprehension via semantic search and vector indexing, and prioritizing schema elements using custom scoring profiles. Leveraging filters and facets for contextual retrieval, storing synonyms for accurate matching, and utilizing few-shot prompting for training AI models with limited examples can significantly enhance the solution. Integrating context with generative AI models and implementing a multi-step self-correction loop ensures the accuracy of the generated SQL queries. Continual monitoring and updating of the solution ensures adaptability to evolving schemas and query patterns. This strategy provides a robust, efficient, and accurate NL2SQL solution. Key Takeaways: Effectively organize and index schema data for quick retrieval and accurate SQL generation. Store detailed schema metadata and synonyms for better AI context and query understanding. Utilize semantic search and filters to retrieve relevant schema details. Apply ranking algorithms to improve query generation accuracy. Use few-shot prompting while prompt engineering. Integrate context with generative AI models for precise SQL generation. Implement a self-correction loop for contextual and syntactic accuracy. Regularly monitor and update the solution to adapt to evolving schemas and queries.2.9KViews2likes0CommentsBuilding Enterprise-Grade Deep Research Agents In-House: Architecture and Implementation
As generative AI adoption accelerates, more professionals are recognizing the limitations of basic Retrieval-Augmented Generation (RAG). Many point out that traditional RAG provides only superficial analysis and that search results often appear as isolated points, rather than forming holistic insights. Most RAG implementations rely on single-query searches and summarization, which makes it difficult to explore information in depth or to perform repeated validation. These limitations become especially clear in complex enterprise scenarios. Deep Research is designed to address these challenges. Unlike RAG, Deep Research refers to advanced capabilities where AI receives a userâs query and collects and analyzes information from a variety of perspectives to generate detailed reports. This approach enables much deeper insights and multi-dimensional analysis that are not possible with standard RAG. As a result, Deep Research provides better support for decision-making and problem-solving. However, Deep Research features offered through commercial SaaS or PaaS solutions may not fully satisfy certain advanced enterprise requirements, especially in highly specialized fields. Therefore, this article focuses on the approach of building a custom orchestrator in-house and introduces methods for internalizing Deep Research capabilities. Sample Code for Hand-Crafted Deep Research Sample code for this approach is available on GitHub under the name Deep-Research-Agents. Please see the repository for detailed instructions and updates. kushanon/Deep-Research-Agents: Deep Research Agent is a next-generation MultiAgent system built on Semantic Kernel. Through MagenticOrchestration, multiple specialized AI agents dynamically collaborate to automatically generate high-quality research reports from enterprise internal documents Comparing Deep Research Approaches There are three primary approaches for implementing Deep Research in an enterprise: Approach Overview Advantages Disadvantages 1. SaaS Use Deep Research through services like M365 Copilot or ChatGPT Quick to implement and operate Fine-grained control and integration with existing systems are difficult 2. PaaS Use Deep Research features through APIs, for example Azure AI Foundry Agent Integration with in-house systems is possible through API Requires some system development Extensibility is limited 3. Hand-crafted Orchestrator Develop and operate a custom orchestrator using open source tools, fully tailored to your requirements Highly customizable and extensible Development and maintenance are necessary Although building a hand-crafted orchestrator is more technically demanding, this approach provides outstanding extensibility and allows organizations to fully utilize proprietary algorithms and confidential data. For organizations with advanced technical capabilities and the need for specialized research, for example in manufacturing, healthcare, or finance, this method delivers clear benefits and can become a unique engine for competitive advantage. Designing Hand-Crafted Deep Research The design philosophy adopted for the internal Deep Research implementation is described below. Implementation Strategies There are two main patterns for implementing Deep Research: Pattern Overview Advantages Disadvantages Workflow-based The workflow is strictly defined in advance, and tasks proceed step by step following this workflow. Easy to implement; transparent process Low flexibility; difficult to extend Dynamic routing-based Only the types of tasks are predefined. The AI dynamically determines which task to execute at runtime, based on the input. Highly flexible and extensible Requires expertise; dynamic nature of processing requires greater understanding Most open-source implementations use the workflow-based method, but the flexibility and extensibility required for advanced Deep Research make dynamic routing the preferred choice. Architecture For this project, the dynamic routing approach was used. The system is structured as a hierarchical multi-agent system, where a manager agent handles task management. Semantic Kernelâs Magentic Orchestration enables the manager to assign work to each specialized agent as necessary for effective problem-solving. Why Use a Multi-Agent System? A multi-agent system consists of multiple AI agents, each with its own area of expertise, working together and communicating to accomplish tasks. By adopting a structure similar to a human organization, these agents can divide and coordinate their work, making it possible to address more complex challenges than would be feasible for a single agent. Another advantage of the multi-agent approach is transparency. The interactions and discussions between agents can be easily visualized, which improves the interpretability of results. Furthermore, recent academic research, such as Liu et al. (2025)[1], has reported that using a multi-agent architecture can lead to higher accuracy compared to single-agent approaches. For these reasons, we adopted the multi-agent method in this project. What is Magentic Orchestration? In this project, Magentic Orchestration enables dynamic routing between agents, functioning much like a manager who coordinates a team in a human organization. Magentic Orchestration is a mechanism for managing overall plans and task progress in a multi-agent system. It assigns work to each agent as needed, ensuring that tasks are distributed efficiently and progress is tracked across the entire process. The following diagram illustrates the typical workflow managed by Magentic Orchestration. For a practical example, please refer to the official Semantic Kernel sample code, which demonstrates Magentic Orchestration in action: https://github.com/microsoft/semantic-kernel/blob/main/python/samples/getting_started_with_agents/multi_agent_orchestration/step5_magentic.py Agent Structure This system employs Semantic Kernelâs Magentic Orchestration to coordinate a multi-agent team, with each agent assigned a specialized role inspired by a human research organization. The structure ensures clear role separation and promotes diversity in information gathering for comprehensive and reliable research outputs. Information retrieval is directed by the Lead Researcher, who manages several Sub-Researchers. To enhance the breadth and balance of perspectives, three Sub-Researchers are assigned under the Lead Researcher, each with different temperature settings to increase the diversity of their responses. Magentic Orchestration: Oversees overall planning, manages task progress, assigns work to agents, and compiles the final report Lead Researcher: Develops the research plan, delegates specific research tasks to Sub-Researchers, and synthesizes their findings into a summary Sub-Researchers 1â3: Conduct searches and investigations, each with different behavioral settings to generate diverse results Credibility Critic: Evaluates the reliability and coverage of information sources, providing quality feedback Report Writer: Drafts the initial report based on all collected information Reflection Critic: Reviews the draft report and offers feedback to improve its quality Translator: Translates content as needed Citation Agent: Records and manages references, ensuring proper citations in the final report This hierarchical agent structure allows the system to efficiently coordinate specialized tasks, producing reports that are thorough, trustworthy, and easy to review. Overall Architecture The overall architecture is designed to support both external web searches and internal document searches. For internal content, Azure AI Search is used to index and search enterprise documents, enabling secure and scalable access to proprietary knowledge. If you need to handle structured data, such as tables, you can store this information in Azure Table Storage and create searchable indexes using the Azure AI Search Table Storage Indexer. This approach allows the system to flexibly adapt to various data formats and makes internal research much more efficient The sample code uses Tavily for external web search integration. When leveraging web search APIs or other third-party services, please review the relevant product terms and manage any risks related to information leakage or external query exposure according to your organizationâs policies. Sample Output Below is an example of output generated by running the sample code. The total execution time is approximately seven minutes, with an LLM cost of $1.3. For further details, please refer to Appendix: Output Example Execution Information. Execution Example Input Query: Please summarize official case studies from five companies in Japan that utilize multi-agent systems powered by Azure AI/OpenAI. Provide a detailed report in Japanese, using only information from the companiesâ official websites. Output: Executive Summary This research systematically reviews the latest trends, system architectures, and results of five Japanese companies (NEC, Panasonic Connect, Fujitsu, KDDI, and Panasonic Holdings) that have published official case studies involving the use of Azure AI/OpenAI Service with multi-agent collaboration. The main findings are as follows: All five companies operate multiple task-specific agents under secure, enterprise-grade Azure OpenAI Service environments. Typical roles include search agents, specialized domain agents, and workflow/orchestration agents, with hybrid architectures combining RAG and SaaS integration being the most common. Measured benefits include a 20â40% reduction in search and summarization time, a 15% improvement in attack detection rates, and up to 30% reduction in manual workload. Future plans include scaling out to overseas locations and implementing advanced orchestration features such as meta-cognitive coordination. Note: For this report, Panasonic Connect and Panasonic Holdings are treated as separate entities since they are independent legal companies and publish separate official data. Thus, five distinct companies are included. Detailed Findings 1. NEC Background: The need to handle varying response granularities across departments and tasks highlighted the importance of coordinating multiple agents. Architecture: The âNEC AI Agent for NGSâ consists of Search, Office, Task, and Individual Agents, autonomously collaborating through a hybrid setup that combines Azure OpenAI Service and internal data. Results: Substantial reduction in document search time and expansion of user departments. Outlook: Plans to add orchestration layers and extend use to decision-making and development support. 2. Panasonic Connect Background: Needed a generative AI foundation usable across manufacturing, logistics, and retail domains. Architecture: Combined LangChain, LangGraph, Chainlit, and Azure OpenAI to deploy three agent types (Navigator, Workflow, and General-purpose) simultaneously. Results: Achieved up to 30% reduction in workload for production, logistics, and sales tasks. Outlook: Plans to scale to over 100 projects within FY2025 and extend to code generation and automated design. 3. Fujitsu Background: Needed to automate vulnerability detection and response due to increasing complexity of cyber attacks. Architecture: The âMulti-AI Agent Security Technologyâ uses specialized agents for attack analysis, defense planning, and recovery, coordinated with Azure OpenAI Service. Results: 15% increase in unknown attack detection rates; reduced time to propose countermeasures by over half. Outlook: Aims to deploy globally in SOCs by 2026 and achieve autonomous defense. 4. KDDI Background: Developed an enterprise AI platform integrating multiple AI services for business clients. Architecture: Integrates four types of AI agents through a multi-cloud gateway with existing systems. Results: Reduced workload by 40% across call center summarization, EC product description generation, and manual creation tasks. Outlook: Plans to coordinate agents for network failure analysis and traffic optimization. 5. Panasonic Holdings Background: Established a common AI foundation to safely use generative AI across business units and improve efficiency. Architecture: âPX-GPTâ deploys department-specific agents on the intranet, with Azure OpenAI Service at the core, serving as a multi-agent foundation across departments. Results: 28,000 users within four months of release; average 20% reduction in search and summarization time. Outlook: Plans to expand to overseas offices and integrate with knowledge-sharing platforms by FY2025. Conclusions and Recommendations Japanese enterprises are leveraging the security and scalability of Azure OpenAI Service to connect multiple role-specific AI agents, significantly improving business efficiency and quality. Key points include: Most organizations use RAG-based search agents as the starting point for information retrieval, followed by deeper analysis by specialized agents. Performance indicators such as time reduction and detection rates are quantitatively demonstrated, confirming real-world benefits. Future directions include enhancing orchestration capabilities and expanding use cases to global locations. Recommended actions for enterprises: Automate orchestration: Enhance dynamic task allocation and agent meta-cognition to handle complex workflows. Strengthen governance: Develop clear guidelines for generative AI usage and maintain audit logs to minimize risks of sensitive data leaks or hallucinations. Accelerate PoC-to-production: Redesign business processes to be agent-friendly, and use metrics and ROI from small-scale pilots to enable fast transition to full production use. Source Documentation # Type Title URL or Document ID Date/Version Key Excerpt 1 Web Work DX 2025 Session 4 (NEC) https://jpn.nec.com/digital-wp/workdx2025/document/pdf/workdx2025_session4.pdf 2025-01 âSearch Agent, Office Agent, Task Agent, and Individual Agent autonomously collaborate.â 2 Web Large Language Models Technical Introduction (Panasonic Connect) https://group.connect.panasonic.com/psnrd/technology/large-language-models.html No date â Accessed 2025-07-14 âFlexible AI agent customization with LangChain, LangGraph, Chainlit, and Azure OpenAI.â 3 Web Panasonic Connect Accelerates LLM Agent Utilization https://news.panasonic.com/jp/press/jn250707-2 2025-07-07 âMultiple navigator, workflow, and general-purpose AI agents deployed.â 4 Web World's First Multi-AI Agent Security Technology Developed (Fujitsu) https://pr.fujitsu.com/jp/news/2024/12/12.html 2024-12-12 âMulti-AI Agent Security Technology.â 5 Web Azure OpenAI Service for Enterprise (KDDI) https://biz.kddi.com/service/ms-azure/open-ai/ No date â Accessed 2025-07-14 âCombining multiple AI agentsâŚâ 6 Web KDDI Accelerates Enterprise DX with Multi-Cloud x Generative AI https://newsroom.kddi.com/news/detail/kddi_pr-958.html 2023-09-05 âBy combining four modelsâŚâ 7 Web Launch of In-house Generative AI Platform 'PX-GPT' (Panasonic) https://news.panasonic.com/jp/press/jn230414-1 2023-04-14 âMulti-agent platform available across departments.â References Work DX 2025 Session 4, NEC, https://jpn.nec.com/digital-wp/workdx2025/document/pdf/workdx2025_session4.pdf Large Language Models Technical Introduction, Panasonic Connect, https://group.connect.panasonic.com/psnrd/technology/large-language-models.html Panasonic Connect Accelerates LLM Agent Utilization, Panasonic Newsroom, https://news.panasonic.com/jp/press/jn250707-2 Worldâs First Multi-AI Agent Security Technology Developed, Fujitsu, https://pr.fujitsu.com/jp/news/2024/12/12.html Azure OpenAI Service for Enterprise, KDDI, https://biz.kddi.com/service/ms-azure/open-ai/ KDDI Accelerates Enterprise DX with Multi-Cloud x Generative AI, KDDI Newsroom, https://newsroom.kddi.com/news/detail/kddi_pr-958.html Launch of In-house Generative AI Platform 'PX-GPT', Panasonic Newsroom, https://news.panasonic.com/jp/press/jn230414-1 Conclusion In summary, this article has demonstrated the following key points regarding the in-house implementation of Deep Research: Flexible deep dives and multi-perspective analysis: Hand-crafted Deep Research enables organizations to conduct detailed and iterative analysis, allowing solutions to be tailored to unique business requirements. This level of flexibility is difficult to achieve with RAG approaches. Creation of competitive advantage: While custom implementation requires initial investment and development effort, it allows advanced organizations to build Deep Research capabilities that are fully aligned with their own needs, providing a significant source of competitive differentiation. High expertise, extensibility, and transparency with multi-agent and dynamic routing: By combining multi-agent systems with dynamic routing, organizations can achieve high accuracy and adaptability. This architecture also makes it easy to add new features or data sources in the future. Fast and smooth adoption with open source sample code: By starting with the publicly available code on GitHub, organizations can quickly set up their own environments and begin evaluation with minimal risk. I encourage you to leverage these approaches and try implementing Deep Research within your own organization. Appendix: Output Example Execution Information Execution time: About 7 minutes Total LLM cost: $1.3 o3: $0.34 GPT-4.1: $0.96 Token usage: o3: Input tokens (without cache): 93.2K Input tokens (cached): 46.4K Output tokens: 16.0K GPT-4.1: Input tokens (without cache): 358.1K Input tokens (cached): 302.7K Output tokens: 12.1K The processing cost and token usage for GPT-4.1 are particularly high due to handling of search results. To reduce costs, it is effective to adjust the number of Sub-Researchers running in parallel. Reference [1] Liu, Bang, et al. "Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems." arXiv preprint arXiv:2504.01990 (2025).2.1KViews3likes0CommentsThe Future of AI: Harnessing AI agents for Customer Engagements
Discover how AI-powered agents are revolutionizing customer engagementâenhancing real-time support, automating workflows, and empowering human professionals with intelligent orchestration. Explore the future of AI-driven service, including Customer Assist created with Azure AI Foundry.687Views2likes0CommentsThe Future of AI: Autonomous Agents for Identifying the Root Cause of Cloud Service Incidents
Discover how Microsoft is transforming cloud service incident management with autonomous AI agents. Learn how AI-enhanced troubleshooting guides and agentic workflows are reducing downtime and empowering on-call engineers.2.3KViews3likes1CommentIntroduction to OCR Free Vision RAG using Colpali For Complex Documents
Explore the cutting-edge world of document retrieval with "From Pixels to Intelligence: Introduction to OCR Free Vision RAG using ColPali for Complex Documents." This blog post delves into how ColPali revolutionizes the way we interact with documents by leveraging Vision Language Models (VLMs) to enhance Retrieval-Augmented Generation (RAG) processes.10KViews1like1CommentUp to 40% better relevance for complex queries with new agentic retrieval engine
Agentic retrieval in Azure AI Search is an API designed to retrieve better results for complex queries and agentic scenarios. Here's how it is built and how it performed across our experiments and datasets.3.5KViews1like1Comment