azure ai document intelligence
72 TopicsData Storage in Azure OpenAI Service
Data Stored at Rest by Default Azure OpenAI does store certain data at rest by default when you use specific features (continue reading) In general, the base models are stateless and do not retain your prompts or completions from standard API calls (they aren't used to train or improve the base models). However, some optional service features will persist data in your Azure OpenAI resource. For example, if you upload files for fine-tuning, use the vector store, or enable stateful features like Assistants API Threads or Stored Completions, that data will be stored at rest by the service. This means content such as training datasets, embeddings, conversation history, or output logs from those features are saved within your Azure environment. Importantly, this storage is within your own Azure tenant (in the Azure OpenAI resource you created) and remains in the same geographic region as your resource. In summary, yes – data can be stored at rest by default when using these features, and it stays isolated to your Azure resource in your tenant. If you only use basic completions without these features, then your prompts and outputs are not persisted in the resource by default (aside from transient processing). Location and Deletion of Stored Data Location: All data stored by Azure OpenAI features resides in your Azure OpenAI resource’s storage, within your Azure subscription/tenant and in the same region (geography) that your resource is deployed. Microsoft ensures this data is secured — it is automatically encrypted at rest using AES-256 encryption, and you have the option to add a customer-managed key for double encryption (except in certain preview features that may not support CMK). No other Azure OpenAI customers or OpenAI (the company) can access this data; it remains isolated to your environment. Deletion: You retain full control over any data stored by these features. The official documentation states that stored data can be deleted by the customer at any time. For instance, if you fine-tune a model, the resulting custom model and any training files you uploaded are exclusively available to you and you can delete them whenever you wish. Similarly, any stored conversation threads or batch processing data can be removed by you through the Azure portal or API. In short, data persisted for Azure OpenAI features is user-managed: it lives in your tenant and you can delete it on demand once it’s no longer needed. Comparison to Abuse Monitoring and Content Filtering It’s important to distinguish the above data storage from Azure OpenAI’s content safety system (content filtering and abuse monitoring), which operates differently: Content Filtering: Azure OpenAI automatically checks prompts and generations for policy violations. These filters run in real-time and do not store your prompts or outputs in the filter models, nor are your prompts/outputs used to improve the filters without consent. In other words, the content filtering process itself is ephemeral – it analyzes the content on the fly and doesn’t permanently retain that data. Abuse Monitoring: By default (if enabled), Azure OpenAI has an abuse detection system that might log certain data when misuse is detected. If the system’s algorithms flag potential violations, a sample of your prompts and completions may be captured for review. Any such data selected for human review is stored in a secure, isolated data store tied to your resource and region (within the Azure OpenAI service boundaries in your geography). This is used strictly for moderation purposes – e.g. a Microsoft reviewer could examine a flagged request to ensure compliance with the Azure OpenAI Code of Conduct. When Abuse Monitoring is Disabled: if you disabled content logging/abuse monitoring (via an approved Microsoft process to turn it off). According to Microsoft’s documentation, when a customer has this modified abuse monitoring in place, Microsoft does not store any prompts or completions for that subscription’s Azure OpenAI usage. The human review process is completely bypassed (because there’s no stored data to review). Only the AI-based checks might still occur, but they happen in-memory at request time and do not persist your data at rest. Essentially, with abuse monitoring turned off, no usage data is being saved for moderation purposes; the system will check content policy compliance on the fly and then immediately discard those prompts/outputs without logging them. Data Storage and Deletion in Azure OpenAI “Chat on Your Data” Azure OpenAI’s “Chat on your data” (also called Azure OpenAI on your data, part of the Assistants preview) lets you ground the model’s answers on your own documents. It stores some of your data to enable this functionality. Below, we explain where and how your data is stored, how to delete it, and important considerations (based on official Microsoft documentation). How Azure Open AI on your data stores your data Data Ingestion and Storage: When you add your own data (for example by uploading files or providing a URL) through Azure OpenAI’s “Add your data” feature, the service ingests that content into an Azure Cognitive Search index (Azure AI Search). The data is first stored in Azure Blob Storage (for processing) and then indexed for retrieval: Files Upload (Preview): Files you upload are stored in an Azure Blob Storage account and then ingested (indexed) into an Azure AI Search index. This means the text from your documents is chunked and saved in a search index so the model can retrieve it during chat. Web URLs (Preview): If you add a website URL as a data source, the page content is fetched and saved to a Blob Storage container (webpage-<index name>), then indexed into Azure Cognitive Search. Each URL you add creates a separate container in Blob storage with the page content, which is then added to the search index. Existing Azure Data Stores: You also have the option to connect an existing Azure Cognitive Search index or other vector databases (like Cosmos DB or Elasticsearch) instead of uploading new files. In those cases, the data remains in that source (for example, your existing search index or database), and Azure OpenAI will use it for retrieval rather than copying it elsewhere. Chat Sessions and Threads: Azure OpenAI’s Assistants feature (which underpins “Chat on your data”) is stateful. This means it retains conversation history and any file attachments you use during the chat. Specifically, it stores: (1) Threads, messages, and runs from your chat sessions, and (2) any files you uploaded as part of an Assistant’s setup or messages. All this data is stored in a secure, Microsoft-managed storage account, isolated for your Azure OpenAI resource. In other words, Azure manages the storage for conversation history and uploaded content, and keeps it logically separated per customer/resource. Location and Retention: The stored data (index content, files, chat threads) resides within the same Azure region/tenant as your Azure OpenAI resource. It will persist indefinitely – Azure OpenAI will not automatically purge or delete your data – until you take action to remove it. Even if you close your browser or end a session, the ingested data (search index, stored files, thread history) remains saved on the Azure side. For example, if you created a Cognitive Search index or attached a storage account for “Chat on your data,” that index and the files stay in place; the system does not delete them in the background. How to Delete Stored Data Removing data that was stored by the “Chat on your data” feature involves a manual deletion step. You have a few options depending on what data you want to delete: Delete Chat Threads (Assistants API): If you used the Assistants feature and have saved conversation threads that you want to remove (including their history and any associated uploaded files), you can call the Assistants API to delete those threads. Azure OpenAI provides a DELETE endpoint for threads. Using the thread’s ID, you can issue a delete request to wipe that thread’s messages and any data tied to it. In practice, this means using the Azure OpenAI REST API or SDK with the thread ID. For example: DELETE https://<your-resource-name>.openai.azure.com/openai/threads/{thread_id}?api-version=2024-08-01-preview . This “delete thread” operation will remove the conversation and its stored content from the Azure OpenAI Assistants storage (Simply clearing or resetting the chat in the Studio UI does not delete the underlying thread data – you must call the delete operation explicitly.) Delete Your Search Index or Data Source: If you connected an Azure Cognitive Search index or the system created one for you during data ingestion, you should delete the index (or wipe its documents) to remove your content. You can do this via the Azure portal or Azure Cognitive Search APIs: go to your Azure Cognitive Search resource, find the index that was created to store your data, and delete that index. Deleting the index ensures all chunks of your documents are removed from search. Similarly, if you had set up an external vector database (Cosmos DB, Elasticsearch, etc.) as the data source, you should delete any entries or indexes there to purge the data. Tip: The index name you created is shown in the Azure AI Studio and can be found in your search resource’s overview. Removing that index or the entire search resource will delete the ingested data. Delete Stored Files in Blob Storage: If your usage involved uploading files or crawling URLs (thereby storing files in a Blob Storage container), you’ll want to delete those blobs as well. Navigate to the Azure Blob Storage account/container that was used for “Chat on your data” and delete the uploaded files or containers containing your data. For example, if you used the “Upload files (preview)” option, the files were stored in a container in the Azure Storage account you provided– you can delete those directly from the storage account. Likewise, for any web pages saved under webpage-<index name> containers, delete those containers or blobs via the Storage account in Azure Portal or using Azure Storage Explorer. Full Resource Deletion (optional): As an alternative cleanup method, you can delete the Azure resources or resource group that contain the data. For instance, if you created a dedicated Azure Cognitive Search service or storage account just for this feature, deleting those resources (or the whole resource group they reside in) will remove all stored data and associated indices in one go. Note: Only use this approach if you’re sure those resources aren’t needed for anything else, as it is a broad action. Otherwise, stick to deleting the specific index or files as described above. Verification: Once you have deleted the above, the model will no longer have access to your data. The next time you use “Chat on your data,” it will not find any of the deleted content in the index, and thus cannot include it in answers. (Each query fetches data fresh from the connected index or vector store, so if the data is gone, nothing will be retrieved from it.) Considerations and Limitations No Automatic Deletion: Remember that Azure OpenAI will not auto-delete any data you’ve ingested. All data persists until you remove it. For example, if you remove a data source from the Studio UI or end your session, the configuration UI might forget it, but the actual index and files remain stored in your Azure resources. Always explicitly delete indexes, files, or threads to truly remove the data. Preview Feature Caveats: “Chat on your data” (Azure OpenAI on your data) is currently a preview feature. Some management capabilities are still evolving. A known limitation was that the Azure AI Studio UI did not persist the data source connection between sessions – you’d have to reattach your index each time, even though the index itself continued to exist. This is being worked on, but it underscores that the UI might not show you all lingering data. Deleting via API/portal is the reliable way to ensure data is removed. Also, preview features might not support certain options like customer-managed keys for encryption of the stored data(the data is still encrypted at rest by Microsoft, but you may not be able to bring your own key in preview). Data Location & Isolation: All data stored by this feature stays within your Azure OpenAI resource’s region/geo and is isolated to your tenant. It is not shared with other customers or OpenAI – it remains private to your resource. So, deleting it is solely your responsibility and under your control. Microsoft confirms that the Assistants data storage adheres to compliance like GDPR and CCPA, meaning you have the ability to delete personal data to meet compliance requirements Costs: There is no extra charge specifically for the Assistant “on your data” storage itself. The data being stored in a cognitive search index or blob storage will simply incur the normal Azure charges for those services (for example, Azure Cognitive Search indexing queries, or storage capacity usage). Deleting unused resources when you’re done is wise to avoid ongoing charges. If you only delete the data (index/documents) but keep the search service running, you may still incur minimal costs for the service being available – consider deleting the whole search resource if you no longer need it Residual References: After deletion, any chat sessions or assistants that were using that data source will no longer find it. If you had an Assistant configured with a now-deleted vector store or index, you might need to update or recreate the assistant if you plan to use it again, as the old data source won’t resolve. Clearing out the data ensures it’s gone from future responses. (Each new question to the model will only retrieve from whatever data sources currently exist/are connected.) In summary, the data you intentionally provide for Azure OpenAI’s features (fine-tuning files, vector data, chat histories, etc.) is stored at rest by design in your Azure OpenAI resource (within your tenant and region), and you can delete it at any time. This is separate from the content safety mechanisms. Content filtering doesn’t retain data, and abuse monitoring would ordinarily store some flagged data for review – but since you have that disabled, no prompt or completion data is being stored for abuse monitoring now. All of these details are based on Microsoft’s official documentation, ensuring your understanding is aligned with Azure OpenAI’s data privacy guarantees and settings. Azure OpenAI “Chat on your data” stores your content in Azure Search indexes and blob storage (within your own Azure environment or a managed store tied to your resource). This data remains until you take action to delete it. To remove your data, delete the chat threads (via API) and remove any associated indexes or files in Azure. There are no hidden copies once you do this – the system will not retain context from deleted data on the next chat run. Always double-check the relevant Azure resources (search and storage) to ensure all parts of your data are cleaned up. Following these steps, you can confidently use the feature while maintaining control over your data lifecycle.4KViews1like1CommentUnveiling the Next Generation of Table Structure Recognition
In an era where data is abundant, the ability to accurately and efficiently extract structured information like tables from diverse document types is critical. For instance, consider the complexities of a balance sheet with multiple types of assets or an invoice with various charges, both presented in a table format that can be challenging even for humans to interpret. Traditional parsing methods often struggle with the complexity and variability of real-world tables, leading to manual intervention and inefficient workflows. This is because these methods typically rely on rigid rules or predefined templates that fail when encountering variations in layout, formatting, or content, which are common in real-world documents. While the promise of Generative AI and Large Language Models (LLMs) in document understanding is vast, our research in table parsing has revealed a critical insight: for tasks requiring precision in data alignment, such as correctly associating data cells with their respective row and column headers, classical computer vision techniques currently offer superior performance. Generative AI models, despite their powerful contextual understanding, can sometimes exhibit inconsistencies and misalignments in tabular structures, leading to compromised data integrity (Figure 1). Therefore, Azure Document Intelligence (DI) and Content Understanding (CU) leverages an even more robust and proven computer vision algorithms to ensure the foundational accuracy and consistency that enterprises demand. Figure 1: Vision LLMs struggle to accurately recognize table structure, even in simple tables. Our current table recognizer excels at accurately identifying table structures, even those with complex layouts, rotations, or curved shapes. However, it does have its limitations. For example, it occasionally fails to properly delineate a table where the logical boundaries are not visible but must be inferred from the larger document context, making suboptimal inferences. Furthermore, its architectural design makes it challenging to accelerate on modern GPU platforms, impacting its runtime efficiency. Taking these limitations in considerations and building upon our existing foundation, we are introducing the latest advancement in our table structure recognizer. This new version significantly enhances both performance and accuracy, addressing key challenges in document processing. Precise Separation Line Placement We've made significant strides in the precision of separation line placement. While predicting these separation lines might seem deceptively simple, it comes with subtle yet significant challenges. In many real-world documents, these are logical separation lines, meaning they are not always visibly drawn on the page. Instead, their positions are often implied by an array of nuanced visual cues such as table headers/footers, dot filler text, background color changes, and even the spacing and alignment of content within the cells. Figure 2: Visual Comparison of separation line prediction of current and the new version We've developed a novel model architecture that can be trained end-to-end to directly tackle the above challenges. Recognizing the difficulty for humans to consistently label table separation lines, we've devised a training objective that combines Hungarian matching with an adaptive matching weight to correctly align predictions with ground truth even when the latter is noisy. Additionally, we've incorporated a loss function inspired by speech recognition to encourage the model to accurately predict the correct number of separation lines, further enhancing its performance. Our improved algorithms now respect visual cues more effectively, ensuring that separation lines are placed precisely where they belong. This leads to cleaner, more accurate table structures and ultimately, more reliable data extraction. Figure 2 shows the comparison between the current model and the new model on a few examples. Some quantitative results can be found in Table 1. TSR (current, in %) TSR-v2 (next-gen, in %) Segment Precision Recall F1-Score Precision Recall F1-score Latin 90.2 90.7 90.4 94.0 95.7 94.8 Chinese 96.1 95.3 95.7 97.3 96.8 97.0 Japanese 93.5 93.8 93.7 95.1 97.1 96.1 Korean 95.3 95.9 95.6 97.5 97.8 97.7 Table 1: Table structure accuracy measured by cell prediction precision and recall rates at IoU (intersection over union) threshold of 0.5. Tested on in-house test datasets covering four different scripts. A Data-Driven, GPU-Accelerated Design Another innovation in this release is its data-driven, fully GPU-accelerated design. This architectural shift delivers enhanced quality and significantly faster inference speeds, which is critical for processing large volumes of documents. The design carefully considers the trade-off between model capability and latency requirements, prioritizing an architecture that leverages the inherent parallelism of GPUs. This involves favoring highly parallelizable models over serial approaches to maximize GPU utilization. Furthermore, post-processing logic has been minimized to prevent it from becoming a bottleneck. This comprehensive approach has resulted in a drastic reduction in processing latency, from 250ms per image to less than 10ms. Fueling Robustness with Synthetic Data Achieving the high level of accuracy and robustness required for enterprise-grade table recognition demands vast quantities of high-quality training data. To meet this need efficiently, we've strategically incorporated synthetic data into our development pipeline. A few examples can be found in Figure 3. Figure 3: Synthesized tables Synthetic data offers significant advantages: it's cost-effective to generate and provides unparalleled control over the dataset. This allows us to rapidly synthesize diverse and specific table styles, including rare or challenging layouts, which would be difficult and expensive to collect from real-world documents. Crucially, synthetic data comes with perfectly consistent labels. Unlike human annotation, which can introduce variability, synthetic data ensures that our models learn from a flawlessly labeled ground truth, leading to more reliable and precise training outcomes. Summary This latest version of our table structure recognizer enhances critical document understanding capabilities. We've refined separation line placement to better respect visual cues and implied structures, supported by our synthetic data approach for consistent training. This enhancement, in turn, allows users to maintain the table structure as intended, reducing the need for manual post-processing to clean up the structured output. Additionally, a GPU-accelerated, data-driven design delivers both improved quality and faster performance, crucial for processing large document volumes.974Views2likes2CommentsWhat If You Could Cut AI Costs by 60% Without Losing Quality?
That’s the promise behind the new pricing model for Azure AI Content Understanding. We’ve restructured how you pay for document, audio, and video analysis—moving from rigid field-based pricing to a flexible, token-based system that lets you pay only for what you use. Whether you're extracting layout from documents or identifying actions in a video, the new pricing structure delivers up to 60% cost savings for many typical tasks and more control over your spend. Why We’re Moving to Token-Based Pricing Field-based pricing was easy to understand, but it didn’t reflect the real work being done. Some fields are simple. Others require deep reasoning, cross-referencing, and contextual understanding. So we asked: What if pricing scaled with complexity? Enter tokens. Tokens are the atomic units of language models—think of them like syllables. By pricing based on tokens, we can: Reflect actual compute usage Align with generative AI model pricing Offer more predictability to developers What’s Included in the New Pricing Model? The new pricing structure has three components – Content Extraction, Field Extraction, and Contextualization. Each of these components are essential for enabling customers to create content processing tasks delivering high quality. ntent Understanding framework for multimodal file processing 1. Content Extraction 🧾 Content Extraction is the essential first step for transforming unstructured input—whether it’s a document, audio, video, or image—into a standardized, reusable format. This process alone delivers significant value, as it allows you to consistently access and utilize information from any source, no matter how varied or complex the original data might be. Content Extraction also serves as the foundation for the more advanced data processing of Field Extraction. We’re lowering the price significantly for both Document Content Extraction and the Face Grouping & Identification add-on for video. Pricing Breakdown: Modality Feature Unit New Price % Change Document Content Extraction (Now includes Layout and Formula) per 1,000 pages $5.00 61% Lower Audio Content Extraction per hour $0.36 No change Video Content Extraction per hour $1.00 No change Video Add-on: Video Face Grouping & Identification per hour $2.00 40% Lower Image Content Extraction N/A N/A N/A 2. Field Extraction 🧠 Field Extraction is where your custom schema comes to life. Using generative models like GPT-4o and o3-mini, we extract the specific fields you define—whether it’s invoice totals, contract terms, or customer sentiment. With this update, we’ve aligned pricing directly to token usage, matching the regional rates of GPT-4o for Standard and o3-mini for Pro. You can now choose the mode depending on your use case. These tokens will be charged based on the actual content processed by the generative models for field extraction using the standard Azure OpenAI tokenizer. Pro and Standard modes provide two distinct ways to process content as part of the 2025-05-01-preview version of the APIs. Standard mode efficiently extracts structured fields from individual files using your defined schema, while Pro mode is tailored for advanced scenarios involving multi-step reasoning and can process multiple files with reference data. For a more detailed comparison of the capabilities of Standard and Pro modes check out Azure AI Content Understanding standard and pro modes - Azure AI services | Microsoft Learn. Initially pro mode only supports documents, but this will expand over time. Pricing Breakdown: Mode Token Type Unit Price Standard Input Tokens per 1M tokens $2.75 Standard Output Tokens per 1M tokens $11.00 Pro Input Tokens per 1M tokens $1.21 Pro Output Tokens per 1M tokens $4.84 Note that although the price per 1M tokens is lower for the Pro mode, it typically consume substantially more tokens than the Standard mode. 3. Contextualization 🔍 Accurate field extraction depends on context, which is why we've introduced a separate charge for Contextualization—covering processes such as output normalization, adding source references, and calculating confidence scores to enhance accuracy and consistency. It also enables in-context learning which allows you to continually refine analyzers with feedback. It’s an investment in quality with real value as data like confidence scores can enable more straight-through processing reducing cost and improving quality. These features are now priced transparently so you can see exactly where your value comes from. Contextualization tokens are always used as part of analyzers the run field extraction. Pricing Breakdown: Customers are charged Contextualization tokens based on the size of the files (documents, images, audio or video) that are processed. Tokens for Standard and Pro have different prices. Mode Token Type Unit Price Standard Contextualization Tokens per 1M tokens $1.00 Pro Contextualization Tokens per 1M tokens $1.50 Unlike the Field Extraction tokens, which are calculated using the Azure OpenAI tokenizers, Contextualization tokens are consumed at a fixed rate based on the size of the input file. For example, a 1 page document processed with Standard mode will cost 1000 contextualization tokens, as shown in the table below. Thus, the cost for contextualization will be $0.001 for that processing. Units Contextualization Tokens Effective Standard Price per unit 1 Page[1] 1000 contextualization tokens $1 per 1000 pages 1 Image 1000 contextualization tokens $1 per 1000 images 1 hour audio 100,000 contextualization tokens $0.1 per hour 1 hour video 1,000,000 contextualization tokens $1 per hour 📊 Pricing examples Let’s walk through three detailed examples of how the new pricing structure works out in practice. 📄 Example 1: Document Content Extraction Only (1,000 Pages) Scenario: You want to extract layout and formulas from a 1,000-page document—no field extraction, just the raw content. Old Pricing (Preview.1): Document content extraction: $5 Layout add-on: $5 Formula add-on: $3 Total: $13.00 per 1,000 pages New Pricing (Preview.2): Document content extraction (now includes layout + formula): $5.00 per 1,000 pages Note that pricing will be prorated when processing some fraction of 1000 pages. ✅ Savings: ~62% reduction in cost for the same functionality. 🧠 Example 2: Document Field Extraction (1,000 Pages) Scenario: You want to extract structured fields from a 1,000-page document using Standard Mode. Assumptions: ~2,000 tokens for content extraction output ~300 tokens for field schema output ~300 tokens for metaprompts Each page generates ~2,600 tokens total: Total tokens = 2.6M input tokens Output tokens = 90K (assuming ~20 fields per page with short responses) Contextualization = 1M tokens (1,000 tokens per page) Step-by-Step Pricing: Input Tokens: 2.6M × $2.75/M = $7.15 Output Tokens: 90K × $11/M = $0.99 Content Extraction: $5.00 Field Extraction: Contextualization: 1M × $1.00/M = $1.00 Total (Preview.2): $5.00 (CE) + $7.15 (Input) + $0.99 (Output) + $1.00 (Contextualization) = $14.14 Old Pricing (Preview.1): Flat rate: $30.00 per 1,000 pages ✅ Savings: Over 50% reduction in cost, with more transparent, usage-based billing. 🎥 Example 3: Video Field Extraction (1 Hour) Scenario: You want to extract structured data from 1 hour of video content at a segment level. Segments are short 15-30 seconds on average, resulting in a substantial number of output segments. Assumptions: Input tokens: 7500 tokens for 1 min (based on sampled frames, transcription, schema prompts and metaprompts) Output tokens: 900 tokens for 1 min (assuming 10-20 short structured fields per segment with auto segmentation) Contextualization: 1M tokens per hour of video Step-by-Step Pricing: Input Tokens: 450k × $2.75/M = $1.24 Output Tokens: 54k × $11/M = $0.59 Content Extraction: $1.00 Field Extraction: Contextualization: 1M × $1.00/M = $1.00 Total (Preview.2): $1.00 (CE) + $1.24 (Input) + $0.59 (Output) + $1.00 (Contextualization) = $3.83 Old Pricing (Preview.1): Flat rate: $10.00 per hour ✅ Savings: Over 60% reduction in cost, with more transparent, usage-based billing. Note: Actual cost saving will vary based on the specifics of the input and output 📚 Learn More Learn more from Microsoft Learn - What is Azure AI Content Understanding? - Azure AI services | Microsoft Learn AI Show Demo of Content Understanding pro mode - AI Show LIVE | AI App templates & Azure AI Content Understanding View detailed presentation from Build 2025 - Reasoning on multimodal content for efficient agentic AI app building Check out quickstart for Azure AI Foundry - Create an Azure AI Content Understanding in the Azure AI Foundry portal | Microsoft Learn Have questions or feedback on Content Understanding? Email CU_Contact@Microsoft.com -------------------------- What could you build now that pricing is not a blocker? Let us know how you’re using Content Understanding—we’d love to feature your story in a future post. -------------------------- [1] For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted as one page.1.9KViews0likes2CommentsSeamlessly Integrating Azure Document Intelligence with Azure API Management (APIM)
In today’s data-driven world, organizations are increasingly turning to AI for document understanding. Whether it's extracting invoices, contracts, ID cards, or complex forms, Azure Document Intelligence (formerly known as Form Recognizer) provides a robust, AI-powered solution for automated document processing. But what happens when you want to scale, secure, and load balance your document intelligence backend for high availability and enterprise-grade integration? Enter Azure API Management (APIM) — your gateway to efficient, scalable API orchestration. In this blog, we’ll explore how to integrate Azure Document Intelligence with APIM using a load-balanced architecture that works seamlessly with the Document Intelligence SDK — without rewriting your application logic. Azure Doc Intelligence SDKs simplify working with long-running document analysis operations — particularly asynchronous calls — by handling the polling and response parsing under the hood. Why Use API Management with Document Intelligence? While the SDK is great for client-side development, APIM adds essential capabilities for enterprise-scale deployments: 🔐 Security & authentication at the gateway level ⚖️ Load balancing across multiple backend instances 🔁 Circuit breakers, caching, and retries 📊 Monitoring and analytics 🔄 Response rewriting and dynamic routing By routing all SDK and API calls through APIM, you get full control over traffic flow, visibility into usage patterns, and the ability to scale horizontally with multiple Document Intelligence backends. SDK Behavior with Document Intelligence When using the Document Intelligence SDK (e.g., begin_analyze_document), it follows this two-step pattern: POST request to initiate document analysis Polling (GET) request to the operation-location URL until results are ready This is an asynchronous pattern where the SDK expects a polling URL in the response of the POST. If you’re not careful, this polling can bypass APIM — which defeats the purpose of using APIM in the first place. So what do we do? The Smart Rewrite Strategy We use APIM to intercept and rewrite the response from the POST call. POST Flow SDK sends a POST to: https://apim-host/analyze APIM routes the request to one of the backend services: https://doc-intel-backend-1/analyze Backend responds with: operation-location: https://doc-intel-backend-1/operations/123 APIM rewrites this header before returning to the client: operation-location: https://apim-host/operations/poller?backend=doc-intel-backend-1 Now, the SDK will automatically poll APIM, not the backend directly. GET (Polling) Flow Path to be set as /operations/123 in GET operation of APIM SDK polls: https://apim-host/operations/123?backend=doc-intel-backend-1 APIM extracts the query parameter backend=doc-intel-backend-1 APIM dynamically sets the backend URL for this request to: https://doc-intel-backend-1 It forwards the request to: https://doc-intel-backend-1/operations/123 Backend sends the status/result back to APIM → which APIM returns to the SDK. All of this happens transparently to the SDK. Sample policies //Outbound policies for POST - /documentintelligence/documentModels/prebuilt-read:analyze //--------------------------------------------------------------------------------------------------- <!-- - Policies are applied in the order they appear. - Position <base/> inside a section to inherit policies from the outer scope. - Comments within policies are not preserved. --> <!-- Add policies as children to the <inbound>, <outbound>, <backend>, and <on-error> elements --> <policies> <!-- Throttle, authorize, validate, cache, or transform the requests --> <inbound> <base /> </inbound> <!-- Control if and how the requests are forwarded to services --> <backend> <base /> </backend> <!-- Customize the responses --> <outbound> <base /> <set-header name="operation-location" exists-action="override"> <value>@{ // Original operation-location from backend var originalOpLoc = context.Response.Headers.GetValueOrDefault("operation-location", ""); // Encode original URL to pass as query parameter var encoded = System.Net.WebUtility.UrlEncode(originalOpLoc); // Construct APIM URL pointing to poller endpoint with backendUrl var apimUrl = $"https://tstmdapim.azure-api.net/document-intelligent/poller?backendUrl={encoded}"; return apimUrl; }</value> </set-header> </outbound> <!-- Handle exceptions and customize error responses --> <on-error> <base /> </on-error> </policies> //Inbound policies for Get (Note: path for get should be modified - /document-intelligent/poller //---------------------------------------------------------------------------------------------- <!-- - Policies are applied in the order they appear. - Position <base/> inside a section to inherit policies from the outer scope. - Comments within policies are not preserved. --> <!-- Add policies as children to the <inbound>, <outbound>, <backend>, and <on-error> elements --> <policies> <!-- Throttle, authorize, validate, cache, or transform the requests --> <inbound> <base /> <choose> <when condition="@(context.Request.Url.Query.ContainsKey("backendUrl"))"> <set-variable name="decodedUrl" value="@{ var backendUrlEncoded = context.Request.Url.Query.GetValueOrDefault("backendUrl", ""); // Make sure to decode the URL properly, potentially multiple times if needed var decoded = System.Net.WebUtility.UrlDecode(backendUrlEncoded); // Check if it's still encoded and decode again if necessary while (decoded.Contains("%")) { decoded = System.Net.WebUtility.UrlDecode(decoded); } return decoded; }" /> <!-- Log the decoded URL for debugging remove if not needed--> <trace source="Decoded URL">@((string)context.Variables["decodedUrl"])</trace> <send-request mode="new" response-variable-name="backendResponse" timeout="30" ignore-error="false"> <set-url>@((string)context.Variables["decodedUrl"])</set-url> <set-method>GET</set-method> <authentication-managed-identity resource="https://cognitiveservices.azure.com/" /> </send-request> <return-response response-variable-name="backendResponse" /> </when> <otherwise> <return-response> <set-status code="400" reason="Missing backendUrl query parameter" /> <set-body>{"error": "Missing backendUrl query parameter."}</set-body> </return-response> </otherwise> </choose> </inbound> <!-- Control if and how the requests are forwarded to services --> <backend> <base /> </backend> <!-- Customize the responses --> <outbound> <base /> </outbound> <!-- Handle exceptions and customize error responses --> <on-error> <base /> </on-error> </policies> Load Balancing in APIM You can configure multiple backend services in APIM and use built-in load-balancing policies to: Distribute POST requests across multiple Document Intelligence instances Use custom headers or variables to control backend selection Handle failure scenarios with circuit-breakers and retries Reference: Azure API Management backends – Microsoft Learn Sample: Using APIM Circuit Breaker & Load Balancing – Microsoft Community Hub Conclusion By integrating Azure Document Intelligence with Azure API Management native capabilities like Load balancing, rewrite header, authentication, rate limiting policies, organizations can transform their document processing workflows into scalable, secure, and efficient systems.1.3KViews5likes17CommentsFrom diagrams to dialogue: Introducing new multimodal functionality in Azure AI Search
Discover the new multimodal capabilities in Azure AI Search, enabling integration of text and complex image data for enhanced search experiences. With features like image verbalization, multimodal embeddings, and intuitive portal wizard configuration, developers can build AI applications that deliver comprehensive answers from both text and complex visual content. Discover how multimodal search empowers RAG apps and AI agents with improved data grounding for more accurate responses, while streamlining development pipelines.2.2KViews1like0CommentsFrom Extraction to Insight: Evolving Azure AI Content Understanding with Reasoning and Enrichment
First introduced in public preview last year, Azure AI Content Understanding enables you to convert unstructured content—documents, audio, video, text, and images—into structured data. The service is designed to support consistent, high-quality output, directed improvements, built-in enrichment, and robust pre-processing to accelerate workflows and reduce cost. A New Chapter in Content Understanding Since our launch we’ve seen customers pushing the boundaries to go beyond simple data extraction with agentic solutions fully automating decisions. This requires more than just extracting fields. For example, a healthcare insurance provider decision to pay a claim requires cross-checking against insurance policies, applicable contracts, patient’s medical history and prescription datapoints. To do this a system needs the ability to interpret information in context, perform more complex enrichments and analysis across various data sources. Beyond field extraction, this requires a custom designed workflow leveraging reasoning. In response to this demand, Content Understanding now introduces Pro mode which enables enhanced reasoning, validation, and information aggregation capabilities. These updates allow the service to aggregate and compare results across sources, enrich extracted data with context, and deliver decisions as output. While Standard mode continues to offer reliable and scalable field extraction, Pro mode extends the service to support more complex content interpretation scenarios—enabling workflows that reflect the way people naturally reason over data. With this update, Content Understanding now solves a much larger component of your data processing workflows, offering new ways to automate, streamline, and enhance decision-making based on unstructured information. Key Benefits of Pro Mode Packed with cutting-edge reasoning capabilities, Pro mode revolutionizes document analysis. Multi-Content Input Process and aggregate information across multiple content files in a single request. Pro mode can build a unified schema from distributed data sources, enabling richer insight across documents. Multi-Step Reasoning Go beyond basic extraction with a process that supports reasoning, linking, validation, and enrichment. Knowledge Base Integration Seamlessly integrate with organizational knowledge bases and domain-specific datasets to enhance field inference. This ensures outputs can reason over the task of generating the output using the context of your business. When to Use Pro Mode Pro mode, currently limited to documents, is designed for scenarios where content understanding needs to go beyond surface-level extraction—ideal for use cases that traditionally require postprocessing, human review and decision-making based on multiple data points and contextual references. Pro mode enables intelligent processing that not only extracts data, but also validates, links, and enriches it. This is especially impactful when extracted information must be cross-referenced with external datasets or internal knowledge sources to ensure accuracy, consistency, and contextual depth. Examples include: Invoice processing that reconciles against purchase orders and contract terms Healthcare claims validation using patient records and prescription history Legal document review where clauses reference related agreements or precedents Manufacturing spec checks against internal design standards and safety guidelines By automating much of the reasoning, you can focus on higher value tasks! Pro mode helps reduce manual effort, minimize errors, and accelerate time to insight—unlocking new potential for downstream applications, including those that emulate higher-order decision-making. Simplified Pricing Model Introducing a simplified pricing structure that significantly reduces costs across all content modalities compared to previous versions, making enterprise-scale deployment more affordable and predictable. Expanded Feature Coverage We are also extending capabilities across various content types: Structured Document Outputs: Improved handling of tables spanning multiple pages, recognition of selection marks, and support for additional file types like .docx, .xlsx, .pptx, .msg, .eml, .rtf, .html, .md, and .xml. Classifier API: Automatically categorize/split and route documents to appropriate processing pipelines. Video Analysis: Extract data across an entire video or break a video into chapters automatically. Enrich metadata with face identification and descriptions that include facial images. Face API Preview: Detect, recognize, and enroll faces, enabling richer user-aware applications. Check out the details about each of these capabilities here - What's New for Content Understanding. Let's hear it from our customers Customers all over the globe are using Content Understanding for its powerful one-stop solution capabilities by leveraging advance modes of reasoning, grounding and confidence scores across diverse content types. ASC: AI-based analytics in ASC’s Recording Insights platform allows customers to move to a 100% compliance review coverage of conversations across multiple channels. ASC’s integration of Content Understanding replaces a previously complex setup—where multiple separate AI services had to be manually connected—with a single multimodal solution that delivers transcription, summarization, sentiment analysis, and data extraction in one streamlined interface. This shift not only simplifies implementation and accelerates time-to-value but also received positive customer feedback for its powerful features and the quick, hands-on support from Microsoft product teams. “With the integration of Content Understanding into the ASC Recording Insights platform, ASC was able to reduce R&D effort by 30% and achieve 5 times faster results than before. This helps ASC drive customer satisfaction and stay ahead of competition.” —Tobias Fengler, Chief Engineering Officer, ASC. To learn more about ASCs integration check out From Complexity to Simplicity: The ASC and Azure AI Partnership.” Ramp: Ramp, the all-in-one financial operations platform, is exploring how Azure AI Content Understanding can help transform receipts, bills, and multi-line invoices into structured data automatically. Ramp is leveraging the pre-built invoice template and experimenting with custom extraction capabilities across various document types. These experiments are helping Ramp evaluate how to further reduce manual entry and enhance the real-time logic that powers approvals, policy checks, and reconciliation. “Content Understanding gives us a single API to parse every receipt and statement we see—then lets our own AI reason over that data in real time. It's an efficient path from image to fully reconciled expense.” — Rahul S, Head of AI, Ramp MediaKind: MK.IO’s cloud-native video platform, available on Azure Marketplace—now integrates Azure AI Content Understanding to make it easy for developers to personalize streaming experiences. With just a few lines of code, you can turn full game footage into real-time, fan-specific highlight reels using AI-driven metadata like player actions, commentary, and key moments. “Azure AI Content Understanding gives us a new level of control and flexibility—letting us generate insights instantly, personalize streams automatically, and unlock new ways to engage and monetize. It’s video, reimagined.” —Erik Ramberg, VP, MediaKind Catch the full story from MediaKind in our breakout session at Build 2025 on May 18: My Game, My Way, where we walk you through the creation of personalized highlight reels in real-time. You’ll never look at your TV in the same way again. Getting Started For more details about the latest from Content Understanding check out Reasoning on multimodal content for efficient agentic AI app building Wednesday, May 21 at 2 PM PST Build your own Content Understanding solution in the Azure AI Foundry. Pro mode will be available in the Foundry starting June 1 st 2025 Refer to our documentation and sample code on Content Understanding Explore the video series on getting started with Content Understanding1.8KViews1like0CommentsIntroducing Azure AI Content Understanding for Beginners
Enterprises today face several challenges in processing and extracting insights from multimodal data, like managing diverse data formats, ensuring data quality, and streamlining workflows efficiently. Ensuring the accuracy and usability of extracted insights often requires advanced AI techniques, while inefficiencies in managing large data volumes increase costs and delay results. Azure AI Content Understanding addresses these pain points by offering a unified solution to transform unstructured data into actionable insights, improve data accuracy with schema extraction and confidence scoring, and integrate seamlessly with Azure’s ecosystem to enhance efficiency and reduce costs. Content Understanding makes it easy to extract custom task-specific output without advanced GenAI skills. It enables a quick path to scale for retrieval augmented generation (RAG) grounded by multimodal data or transactional content processing for agent workflows and process automation. We are excited to announce a new video series to help you get started with Azure AI Content Understanding and extract the task specific output for your business. Whether you're looking for a well-rounded overview, want to discover how to develop a RAG index ovideo content, or learn how to build a post-call analytics workflow, this series has something for everyone. What is Azure AI Content Understanding? Azure AI Content Understanding is a new Azure AI service, designed to process and transform content of any type, including documents, images, videos, audio, and text into a user-defined output schema. This streamlined process allows developers to reason over large amounts of unstructured data, accelerating time-to-value by generating an output that can be easily integrated into agentic, automation and analytical workflows. Video Series Highlights Azure AI Content Understanding: How to Get Started - Vinod Kurpad, Principal GPM, AI Services, shows how you can process content of any modality—audio, video, documents, and text—in a unified workflow in Azure AI Foundry using Azure AI Content Understanding. It's simple, intuitive, and doesn't require any GenAI skills. 2. Post-call Analytics Using Azure AI Content Understanding - Jan Goergen Senior Program Manager, AI Services shows how to process any number of video or audio call recordings quickly in Azure AI Foundry by leveraging the Post‑Call Analytics template powered by Content Understanding. The video also introduces the broader concept of templates, illustrating how you can embed Content Understanding into reusable templates that you can build, deploy, and share across projects. 3. RAG on Video Using Azure AI Content Understanding - Joe Filcik, Principal Product Manager, AI Services, shows how you can process videos and ground them on your data with multimodal retrieval augmented generation (RAG) to derive insights that would otherwise take much longer. Joe demonstrates how this can be achieved using a single Azure AI Content Understanding API in Azure AI Foundry. Why Azure AI Content Understanding? The Azure AI Content Understanding service is ideal for enterprises and developers looking to process large amounts of multimodal content, such as call center recordings and videos for training and compliance, without requiring GenAI skills such as prompt-engineering and model selection. Enjoy the video series and start exploring the possibilities with Azure AI Content Understanding. For additional resources: Watch the Video Series Try it in Azure AI Foundry Content Understanding documentation Content Understanding samples Feedback? Contact us at cu_contact@microsoft.com1.3KViews0likes0CommentsPrototyping Agents with visual tools
Introduction Agents are gaining wide adoption in the emerging generative AI applications for organizations, transforming the way we interact with technology. Agent development using visual tools provides a low code / no code approach in prototyping agentic behavior. They help in creating preliminary versions of agentic applications, enabling development, testing, refining the functionalities before full-scale deployment. Prototyping tools for agents typically have the below features: Visual tools that allow for rapid creation, management and interaction with agentic applications Enable users to define and modify agents and multi-agent workflows through a point-and-click, drag-and-drop interface The interface should make it easier to set parameters for agents within a user-friendly environment and modify flows Chat interface to create chat sessions and view results in a conversational and interactive interface. This will enable interactive agent development and testing Enable adding memory and tools for agents Support for popular OSS agentic frameworks like autogen, langflow, llamaindex, etc Access to built-in add-ins and connectors to build sophisticated workflows Ability to extend the add-ins and build custom connectors Enable tracing for visualization, audit and governance of agents Ability to generate deployment code or provide an API and deploy the resulting workflows By leveraging these tools, developers can quickly prototype and iterate on agent designs, ensuring that the final product is robust, efficient, and capable of delivering a seamless user experience. In this blog, we will look at some OSS options for prototyping and developing agents. AutoGen Studio AutoGen Studio is a low-code interface built to help you rapidly prototype AI agents, enhance them with tools, compose them into teams and interact with them to accomplish tasks. While it is still not meant to be a production-ready app, AutoGen Studio can help users rapidly create, manage, and interact with agents that can learn, adapt, and collaborate. Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task). Create chat sessions with the specified agents and view results (e.g., view chat history, generated files, and time taken). Explicitly add capabilities to your agents and accomplish more tasks. Publish chat sessions to a local gallery. Agent Development Canvas Provides a visual interface for creating agent teams through declarative specification (JSON) or drag-and-drop Supports configuration of all core components: teams, agents, tools, models, and termination conditions Fully compatible with Autogen AgentChat component definitions Component map Edit Components: Code based editor: Playground Provides an interactive environment for testing and running agent teams Live message streaming between agents Visual representation of message flow through a control transition graph Interactive sessions with teams using UserProxyAgent Full run control with the ability to pause or stop execution Tracing and audit Deployment: AutoGen Studio provides options through Docker and python options for depoying the agents. Semantic Workbench Semantic Workbench is another tool to prototype agents. The workbench provides a user-friendly UI for creating conversations with one or more agents, configuring settings, and exposing various behaviours. The Semantic Workbench is composed of three main components: Workbench Service (Python): The backend service that handles core functionalities. Workbench App (React/Typescript): The frontend web user interface for interacting with workbench and assistants. Assistant Services (Python, C#, etc.): any number of assistant services that implement the service protocols/APIs, developed using any framework and programming language of your choice. Designed to be agnostic of any agent framework, language, or platform, the Semantic Workbench facilitates experimentation, development, testing, and measurement of agent behaviours and workflows. Assistants integrate with the workbench via a RESTful API, allowing for flexibility and broad applicability in various development environments. Dashboard Provides a view on existing agents added to the workbench. Agent Development Canvas Canvas to add and import new assistants to the workbench. Agent landing page Option for viewing past conversations, add new conversations to test the flow and assistant configurations. Configure Agents Designing instruction prompts, guardrails, etc.,. Conversation Testing Interface to test the assistant flow. Debugging conversations Logging the conversation trace and using the trace information for debugging Ironclad- Rivet Rivet is a visual programming environment for building AI agents with LLMs. Iterate on your prompt graphs in Rivet, then run them directly in your application. With Rivet, teams can effectively design, debug, and collaborate on complex LLM prompt graphs, and deploy them in their own environment. Agent Development Canvas Sample Flow Flow output Plugins Prompt Designer Testing Letta ADE: The Letta ADE is a graphical user interface for creating, deploying, interacting and observing with agents. Letta enables developers to build and deploy stateful AI agents - agents that maintain memory and context across long-running conversations. The Agent Development Environment (ADE) provides a visual interface for building and monitoring agents, with real-time visibility into agent memory and behavior. Letta’s context management system intelligently manages memory. Post version 0.5.0, the UI interface is not available in local and we are dependent on a web based interface hosted in letta servers – though the backend can still be local. Letta enables developers to build and deploy stateful AI agents - agents that maintain memory and context across long-running conversations. Letta Desktop Letta agents live inside a Letta Server, which persists them to a database. You can interact with the Letta agents inside your Letta Server with the ADE (a visual interface) and connect your agents to external application via the REST API and Python & TypeScript SDKs. Letta Desktop bundles together the Letta Server and the Agent Development Environment (ADE) into a single application Adding LLM backends The Letta server can be connected to various LLM API backends Flowise: Flowise is an open source low-code tool for developers to build customized LLM orchestration flows & AI agents. Authoring Canvas offers advanced interface with options to visually add langchain and llamaindex objects for chatflow. Some of the key features include Authoring canvas for chat flows and agents Chains: Manage the flow of interaction with users, providing a framework to design and implement flows tailored to specific tasks or objectives. Language Models: Responsible for language generation and understanding, optimized for various needs. Prompts: Keywords or patterns that trigger specific actions or responses based on user inputs. Output Parsers: Analyze generated data or responses to extract necessary information. Supports integration with frameworks like Langchain, llamaindex, litellm Offer enterprise plans for SSO support Flowise also has an advanced interface to build agent flows. Tracing Flowise open source repository has a built-in telemetry that collects anonymous usage information. Marketplace Flowise has a large number of templates available that can be useful as a starter template for complex agents. Langflow: Langflow is an OSS framework for building multi-agent and RAG applications. It Python-powered, fully customizable, and LLM and vector store agnostic. Agent Development Canvas Langflow provides a canvas that can easily connect different components, such as prompts, language models, and data sources to help build agentic applications. Each component in a flow is a node that performs a specific task, like an AI model or a data source. Each component has a Configuration menu. Code pane shows a component's underlying Python code. Components are connected with edges to form flows. Components Langflow 1.1 introduced a new agent component, designed to support complex orchestration with built-in model selection, chat memory, and traceable intermediate steps for reasoning and tool-calling actions. Playground Langflow provides a dynamic interface designed for real-time interaction with LLMs, allowing users to chat, access memories, and monitor inputs and outputs. Here, users can directly prototype their models, making adjustments and observing different outcomes API Langflow provides an API pane for code templates to call flows from applications. Starter templates: Langflow has a library of pre-built templates categorized by use case and methodology. Langflow Store Langflow has integration and custom connectors for Flows and components that can be downloaded and imported to the workflows. Feature Comparison: Feature Autogen Studio Semantic Workbench Letta License CC-BY-4.0, MIT licenses MIT license Apache-2.0 license Canvas for chat / agent flow dev Canvas available, limited visual / low-code capabilities, Pro-code. Canvas available, limited visual/low-code capabilities. Pro-code. Limited capabilities. Limited local dev interface post 0.5.0 version. Chat sessions / test flows Available Available Available Templates Tracing Available Available Available Add-in connectors Limited / no options by default. Can be custom built Limited / no options by default. Can be custom built Provides memory tools by default. Deploy agents Available Available Currently on limited preview. Feature Langflow flowise Rivet License MIT license Apache-2.0 license MIT license Canvas for chat / agent flow dev Canvas with rich UI / UX capabilities Canvas with rich UI / UX capabilities Playground available, better UI / UX for agent creation Chat sessions / test flows Available Available Available Templates Tracing Available Available Available Add-in connectors Wide range of connectors available Wide range of connectors available Wide range of built-in connectors Deploy agents Available Available Available References How to develop AI Apps and Agents in Azure - A Visual Guide | All things Azure AutoGen Studio — AutoGen Semantic Workbench for Agentic AI Development microsoft/semanticworkbench: A versatile tool designed to help prototype intelligent assistants, agents and multi-agentic systems Ironclad/rivet: The open-source visual AI programming environment and TypeScript library Introduction to Rivet | Rivet https://github.com/Ironclad/rivet letta-ai/letta: Letta (formerly MemGPT) is a framework for creating LLM services with memory. Agent Development Environment (ADE) — Letta https://docs.flowiseai.com/ https://github.com/FlowiseAI/Flowise https://volcano-ice-cd6.notion.site/Introduction-to-Practical-Building-LLM-Applications-with-Flowise-LangChain-03d6d75bfd20495d96dfdae964bea5a5#eeeab3f52f4047aaa218317f9892aa26 https://github.com/langflow-ai/langflow1.7KViews2likes4Comments