Azure AI Document Intelligence
62 TopicsAnnouncing the General Availability of Document Intelligence v4.0 API
The Document Intelligence v4.0 API is now generally available! This latest version of Document Intelligence API brings new and updated capabilities across the entire product including updates to Read and Layout APIs for content extraction, prebuilt and custom extraction models for schema extraction from documents and classification models. Document Intelligence has all the tools to enable RAG and document automation solutions for structured and unstructured documents. Enhanced Layout capabilities This release brings significant updates to our Layout capabilities, making it the default choice for document ingestion with enhanced support for Retrieval-Augmented Generation (RAG) workflows. The Layout API now offers a markdown output format that provides a better representation of document elements such as headers, footers, sections, section headers and tables when working with Gen AI models. This structured output enables semantic chunking of content, making it easier to ingest documents into RAG workflows and generate more accurate results. Try Layout in the Document Intelligence Studio or use Layout as a skill in your RAG pipelines with Azure Search. Searchable PDF output Document Intelligence no longer outputs only JSON! With the 4.0 release, you can now generate a searchable PDF output from an input document. The recognized text is overlaid over the scanned text, making all the content in the documents instantly searchable. This feature enhances the accessibility and usability of your documents, allowing for quick and efficient information retrieval. Try the new searchable PDF output in the Studio or learn more. Searchable PDF is available as an output from the Read API at no additional cost. This release also includes several updates to the OCR model to better handle complex text recognition challenges. New and updated Prebuilt models Prebuilt models offer a simple API to extract a defined schema from known document types. The v4.0 release adds new prebuilt models for mortgage processing, bank document processing, paystub, credit/debit card, check, marriage certificate, and prebuilt models for processing variants of the 1095, W4, and 1099 tax forms for US tax processing scenarios. These models are ideal for extracting specific details from documents like bank statements, checks, paystubs, and various tax forms. With over 22 prebuilt model types, Document Intelligence has models for common documents in procurement, tax, mortgage and financial services. See models overview for a complete list of document types supported with prebuilt models. Query field add-on capability Query field is an add-on capability to extend the schema extracted from any prebuilt model. This add-on capability is ideal when you have simple fields that need to be extracted. Query field also work with Layout, so for simple documents, you don’t need to train a custom model and can just define the query fields to begin processing the document with no training. Query field supports a maximum of 20 fields per request. Try query field in the Document Intelligence Studio with Layout or any prebuilt model. Document classification model The custom classification models are updated to improve the classification process and now support multi-language documents and incremental training. This allows you to update the classifier model with additional samples or classes without needing the entire training dataset. Classifiers also support analyzing Office document types (.docx, .pptx, and .xls). Version 4.0 adds a classifier copy operation for copying your classifier across resources, regions or subscriptions making model management easier. This version also introduces some changes in the splitting behavior, by default, the custom classification model no longer splits documents during analysis. Learn more about the classification and splitting capabilities. Improvements to Custom Extraction models Custom extraction models now output confidence scores for tables, table rows, and cells. This makes the process of validating model results much easier and provides the tools to trigger human reviews. Custom model capabilities have also improved with the addition of signature detection to neural models and support for overlapping fields. Neural models now include a paid training tier for when you have a large dataset of labeled documents to train. Paid training enables longer training to ensure you have a model that performs better on the different variations in your training dataset. Learn more about improvements to custom extraction models. New implementation of model compose for greater flexibility With custom extraction models in the past, you could compose multiple models into a single composed model. When a document was analyzed with a composed model, the service picked the model best suited to process the document. With this version, the model compose introduces a new implementation requiring a classification model in addition to the extraction models. This enables processing multiple instances of the same document with splitting, conditional routing and more. Learn more about the new model compose implementation. Get started with the v4.0 API today The Document Intelligence v4.0 API is packed with many more updates. Start with the what’s new page to learn more. You can try all of the new and updated capabilities in the Document Intelligence Studio. Explore the new REST API or the language specific SDKs to start building our updating your document workflows.1.5KViews0likes0CommentsAnalyze complex documents with Azure Document Intelligence Markdown Output and Azure OpenAI
In today’s digital era, where data is the new gold, efficiently extracting and processing information from complex documents, including those with dynamic tables, is crucial for businesses. Microsoft’s Azure AI services offer robust solutions for tackling these challenges, especially through the Document Intelligence Layout model. In this blog post, we will explore how you can use markdown output to enhance the capabilities of Azure Document Intelligence Layout model and subsequently feed this refined data into Azure AI for comprehensive information extraction.18KViews8likes1CommentDocument Field Extraction with Generative AI
Adoption of Generative AI technologies is accelerating, driven by the transformative potential they offer across various industry sectors.Azure AI enables organizations to create interactive and responsive AI solutions customized to their requirements, playing a significant part helpingbusinesses harness Generative AI effectively.With the new custom field extraction preview, you can leverage generative AI to efficiently extract fields from documents, ensuring standardized output and a repeatable process to support document automation workflows.6.3KViews5likes2CommentsExtracting Handwritten Corrections with Azure AI Foundry's Latest Tools
In document processing, dealing with documents that contain a mix of handwritten and typed text presents a unique challenge. Often, these documents also feature handwritten corrections where certain sections are crossed out and replaced with corrected text. Ensuring that the final extracted content accurately reflects these corrections is crucial for maintaining data accuracy and usability. In our recent endeavors, we explored various tools to tackle this issue, with a particular focus on Document Intelligence Studio and Azure AI Foundry's new Field Extraction Preview feature. The Challenge Documents with mixed content types—handwritten and typed—can be particularly troublesome for traditional OCR (Optical Character Recognition) systems. These systems often struggle with recognizing handwritten text accurately, especially when it coexists with typed text. Additionally, when handwritten corrections are involved, distinguishing between crossed-out text and the corrected text adds another layer of complexity, as the model is confused with which value(s) to pick out. Our Approach Initial Experiments with Pre-built Models To address this challenge, we initially turned to Document Intelligence Studio's pre-built invoice model, which provided a solid starting point. However, it would often extract both the crossed-out value as well as the new handwritten value under the same field. In addition, it did not always match the correct key to field value. Custom Neural Model Training Next, we attempted to train a custom neural model in the Document Intelligence Studio, which leverages Deep Learning for predicting key document elements, allowing for further adjustments and refinements. It is recommended to use at least 100 to 1000 sample files to achieve more accurate and consistent results. When training models, it is crucial to use text-based PDFs (PDFs with selectable text) as they provide better data for training. The model's accuracy improves with more varied training data, including different types of handwritten edits. Without enough training data or variance, the model may overgeneralize. Therefore, we uploaded approximately 100 text-based pdfs's (PDF has selectable text) to Azure AI Foundry and manually corrected the column containing handwritten text. After training on a subset of these files, we built and tested our custom neural model on the training data. The model performed impressively, achieving a 92% confidence score in identifying the correct values. The main drawbacks were the manual effort required for data labeling and the 30 minutes needed to build the model. During our experiments, we noticed that when extracting fields from a table, labeling and extracting every column comprehensively rather than just a few columns resulted in higher accuracy. The model was better at predicting when it had a complete view of the table Breakthrough with Document Field Extraction (Preview) Finally, the breakthrough came when we leveraged the new Document Field Extraction Preview feature from Azure AI Foundry. This feature demonstrated significant improvements in handling mixed content and provided a more seamless experience in extracting the necessary information. Field Description Modification: One of the key steps in our process was modifying the field descriptions within the Field Extraction Preview feature. By providing detailed descriptions of the fields we wanted to extract, we helped the AI understand the context and nuances of our documents better. Specifically, we wanted to make sure that the value extracted forFOB_COST was the handwritten correction, so we wrote in theField Description: "Ignore strikethrough or 'x'-ed out text at all costs, for example: do not extract red / black pen or marks through text. Do not use stray marks. This field only has numbers." Correction Handling: During the extraction process, the AI was able to distinguish between crossed-out text and the handwritten corrections. Whenever a correction was detected, the AI prioritized the corrected text over the crossed-out content, ensuring that the final extracted data was accurate and up-to-date. Performance Evaluation: After configuring the settings and field descriptions, we ran several tests to evaluate the performance of the extraction process. The results were impressive, with the AI accurately extracting the corrected text and ignoring the crossed-out sections. This significantly reduced the need for manual post-processing and corrections Results The new Field Extraction Preview feature in Azure AI Foundry exceeded our expectations. The modifications we made to the field descriptions, coupled with the AI's advanced capabilities, resulted in a highly efficient and accurate document extraction process. The AI's ability to handle mixed-content documents and prioritize handwritten corrections over crossed-out text has been a game-changer for our workflow. Conclusion For anyone dealing with documents that contain a mix of handwritten and typed text, and where handwritten corrections are present, we highly recommend exploring Azure AI Studio's Field Extraction Preview feature. The improvements in accuracy and efficiency can save significant time and effort, ensuring that your extracted data is both reliable and usable. As we continue to refine our processes, we look forward to even more advancements in document intelligence technologies.456Views1like0CommentsPhi-3 Vision – Catalyzing Multimodal Innovation
Microsoft's Phi-3 Vision is a new AI model that combines text and image data to deliver smart and efficient solutions. With just 4.2 billion parameters, it offers high performance and can run on devices with limited computing power. From describing images to analyzing documents, Phi-3 Vision is designed to make advanced AI accessible and practical for everyday use. Explore how this model is set to change the way we interact with AI, offering powerful capabilities in a small and efficient package.31KViews5likes2CommentsPrep your Data for RAG with Azure AI Search: Content Layout, Markdown Parsing & Improved Security
Introduction We’re excited to announce new preview features in Azure AI Search, specifically designed to enhance data preparation, enrichment, and indexing processes for Retrieval-Augmented Generation (RAG) applications. These updates include the document layout skill—a high-level parser powered by Azure AI Document Intelligence that adapts to scenarios requiring rich content extraction and indexing—and enhanced security withmanaged identities and private endpoints. Together, these features, along with markdown as a parsing mode, provide organizations with fine-grained control over data enrichment, security, and indexing, enabling smarter, more efficient, and more secure RAG solutions. These capabilities are now available through theREST API version 2024-11-01-preview and can also be accessed via the newest beta SDKs in.NET, Python, Java, and JavaScript. These new features are enabled by Azure AI Search’s built-in indexers, which allow users to automate data ingestion and transformation from various data sources. Built-in indexers can apply AI enrichment via skillsets—configurable pipelines that leverage AI skills like OCR, entity recognition, and the new document layout skill—to enrich and enhance data before indexing. These skillsets help customers extract meaningful information from their data, making it easier to search and retrieve relevant content. The new markdown parsing mode and document layout skill work within these built-in indexers to support advanced data preparation and indexing workflows. With these functionalities, Azure AI Search now supports both fixed-size and structure-aware chunking natively during data enrichment. Fixed-size chunking with overlap—where documents are divided into equal-sized sections with slight overlaps—is effective for simple, uniform documents like articles. For more complex content, structure-aware chunking divides documents into sections aligned with their natural structure, improving retrieval accuracy and relevance. Why Structure-Aware Chunking Matters for RAG Azure AI Search’s new native structure-aware chunking capabilities adapt to specific document's unique structure, making RAG solutions more effective. Here’s why structure-aware chunking is key for high-quality retrieval: 1. Enhanced Contextual Understanding By breaking documents into context-rich sections based on logical markers (headers, subsections), RAG applications can retrieve content within its intended context. This allows Large Language Models (LLMs) to answer queries with more clarity, as each chunk retains its full meaning. 2. Enhanced Retrieval Focus Chunking documents based on their structure may assist in refining search results to more relevant sections, which could be particularly beneficial for complex texts like technical documentation or industry-specific guidelines. This approach helps direct users to specific segments of information, potentially minimizing the noise from unrelated content and enhancing the overall user experience. 3. Optimized LLM Performance LLMs perform best with targeted data chunks. Structure-aware chunking helps so only relevant sections are processed, enhancing response accuracy. New Data Ingestion and Enrichment Preview Features in Azure AI Search Note: The new features mentioned in this article can be enabled through theImport and vectorize data" wizard in the Azure portal. Document Layout Skill: Parsing for Complex Documents The document layout skill, part of Azure AI Search’s AI enrichment capabilities, is configured through skillsets within built-in indexers. It functions as a high-level parser, designed to adapt to scenarios requiring in-depth content extraction and indexing of richly structured documents. Powered by Azure AI Document Intelligence layout model, the document layout skill enables advanced parsing for nuanced document layouts, making it ideal for complex data preparation. Structure-Aware Chunking: As part of the AI enrichment process, the document layout skill organizes content into coherent markdown sections based on document structure, such as headers and subsections. This structured chunking enhances RAG applications by allowing each section to retain its contextual meaning. Advanced Parsing for Key Document Elements: Beyond chunking, the document layout skill can extract, and index structured elements like tables and lists, ensuring that critical content is available for precise retrieval. This is especially valuable for documents where specific data points must be indexed separately. Hierarchical Relationship Mapping: The skill maintains relationships between content sections, preserving the document’s logical structure. For instance, technical manuals or regulatory documents can be indexed in a way that retains content hierarchies. Adaptive Scenarios: As a high-level parser, the document layout skill is versatile, suitable for documents where rich content parsing is essential. Benefits for RAG: By transforming complex documents into structured, layout-based chunks with clear hierarchical relationships, the document layout skill enables RAG applications to respond with greater accuracy and context. This feature is ideal when content structure is paramount. Fixed-Chunking Limitations and AI Document Intelligence Solution Why and When Fixed-Size Chunking Falls Short: Fixed-size chunking can struggle with complex document layouts, as it divides content uniformly without respecting document structure. This may lead to disjointed sections and a loss of meaningful context in RAG applications. Improved Results with AI Document Intelligence Layout Model: The example below showcases a small sample document that has been divided into passages without considering the original document structure. This approach highlights the potential limitations of fixed-size chunking when extracting relevant data from documents with rich and complex layouts. To make the concept visually clear and easy to grasp, this example uses approximate rather than exact fixed-size chunks and overlap, as the focus here is on illustrating the challenges rather than achieving technical precision. This simplified demonstration is designed to help readers quickly understand the impact of fixed-size chunking in contrast to structure-aware approaches. Imagine this small text is divided into three distinct passages or chunks, with some overlapping content. The blue rectangle represents the first chunk (chunk #1), the green rectangle the second chunk (chunk #2), and the red rectangle the third chunk (chunk #3). This visualization helps illustrate how fixed-size chunking with overlap processes work. Introduction to Azure AI Search - Azure AI Search | Microsoft Learn with 3 chunks with overlap showing how fixed-size chunking works. If you were to ask a question about the steps for an end-to-end exploration of core search features in the Azure portal, the answer would span two separate chunks: blue (chunk #1) and green (chunk #2). Since neither chunk contains the full context to answer the question with certainty, this illustrates a key limitation of fixed-size chunking in such scenarios. Now, let’s examine two structure-aware chunks created from the same content. These chunks not only preserve the headers but also replicate Header 1 (H1) in both chunks, as it is relevant to both sections that follow. Additionally, the content under Header 2 (H2) is kept intact, including the points necessary to fully answer the question posed earlier. With a single structure-aware chunk (chunk #1 below), the question about the four steps to explore core search features in the portal can now be answered with complete and accurate context. Introduction to Azure AI Search - Azure AI Search | Microsoft Learn with a chunk that would preserve the structure of the document in the same chunk. Introduction to Azure AI Search - Azure AI Search | Microsoft Learn with a chunk that would preserve the structure of the document in the same chunk. Prerequisites for Starting this Integration: Azure AI Search Service: Ensure you have an active Azure AI Search service set up for the integration. AI Services Multi-Service Resource: Required for billing associated with the AI Document Intelligence Layout skill, this multi-service AI account covers costs specifically for document layout analysis services. Note that these charges are separate from Azure AI Search billing. Please review Azure AI Document Intelligence layout model pricing carefully to understand potential costs before using this integration. If you do not specify multi-service resource when configuring the document layout skill, your search service will default to using the free AI enrichments available for your indexer on a daily basis. However, this limits execution to 20 transactions per indexer invocation, after which the process will halt, and a 'Time Out' message will appear in the indexer's execution history. To process additional documents and ensure uninterrupted functionality, you will need to attach an AI Service multi-service resource to the skillset. Your Azure AI Search service and your AI Services multi-service resource must be in any of the following regions: East US, West US2,West Europe or North Central US. Azure OpenAI Embedding Model Deployment: Needed if you’re using integrated vectorization, which is highly recommended for RAG applications. Integrated vectorization allows for automatic vector creation from extracted content and at query time, optimizing retrieval quality and relevance in RAG implementations. JSON Configuration Example: Here’s a sample JSON configuration for setting up the document layout skill in a skillset. { "skills": [ { "description": "Analyze a document", "@odata.type": "#Microsoft.Skills.Util.DocumentLayoutAnalysisSkill", "context": "/document", "outputMode": "oneToMany", "markdownHeaderDepth": "h3", "inputs": [ { "name": "file_data", "source": "/document/file_data" } ], "outputs": [ { "name": "markdown_document", "targetName": "markdown_document" } ] } ] } Keep in mind that even after parsing a document using the AI Document Intelligence layout model with markdown output, very long sections may still require additional fixed-sized chunking. This is necessary because such sections might exceed the RAG-optimal input size for Large Language Models (like GPT-4o), which can impact the relevance in many scenarios. For detailed guidance on determining the optimal chunk size based on multiple use cases, refer to the articleAzure AI Search: Outperforming vector search with hybrid retrieval and ranking capabilities | Microsoft Community Hub. If you're using the Import and vectorize data wizard in the Azure portal, this secondary fixed-size chunking option is automatically handled by default. Markdown Parsing Mode for Structured RAG Retrieval Azure AI Search indexers offer multiple parsing modes: text, JSON, CSV and now, markdown parsing. Markdown parsing, available as a new parsing mode in Azure AI Search’s built-in indexers, provides a structured way to process and index markdown files. By organizing content based on headers, markdown parsing enables more direct retrieval, making it ideal for content where each section can be accessed independently. How It Works: Markdown parsing splits content by headers, creating searchable sections that make each content block accessible to RAG applications. Value for RAG: Markdown parsing is useful for structured documents like FAQs or instructional content, where each question or topic can be indexed as an individual section for quick retrieval. Configuration JSON Example: Here’s a sample JSON configuration for enabling markdown parsing in Azure AI Search’s built-in indexers. For a detailed explanation of how to configure this in the end-to-end pipeline, review markdown parsing documentation. POST https://[service name].search.windows.net/indexers?api-version=2024-11-01-preview Content-Type: application/json api-key: [admin key] { "name": "my-markdown-indexer", "dataSourceName": "my-blob-datasource", "targetIndexName": "my-target-index", "parameters": { "configuration": { "parsingMode": "markdown", "markdownParsingSubmode": "oneToMany", "markdownHeaderDepth": "h6" } }, } This capability is available in a few steps in the Azure portal through the Import and vectorize data wizard. Enhanced Security with Managed Identity and Private Endpoints for AI Services Integration This preview also introduces new security features for the existing integration with AI Services for native AI skills that bring greater flexibility and control to Azure AI Search’s indexing pipeline: Managed Identity for Keyless Connections Newly supported managed identities allow Azure AI Search to connect securely to AI Services without relying on API keys. This approach simplifies security and supports cross-region connections for billing purposes, enabling seamless integration. Private Endpoints for Multi-Service Accounts Azure AI Search implements private endpoints through shared private links. Shared private links, now supporting AI Services, allow Azure AI Search to securely connect built-in indexers to an AI Services multi-service resource, ensuring that all billing-related calls remain private. For AI Services-dependent skills, Azure AI Search processes data through its own Azure AI Services resources, keeping data connections internal via the Azure backbone. However, some enterprises with strict security policies require that even billing calls are routed through a private endpoint, and this capability is now supported. Impact on Secure RAG Applications These security enhancements provide organizations with more robust data privacy controls, essential for building secure, scalable RAG deployments in regulated industries. Prerequisites for Starting this Integration Configure Azure AI Search to use a managed identity. On your AI Services Multi-Service resource, assign the identityto theCognitive Services Userrole. Using the Azure portal, or theSkillset 2024-11-01-preview REST API, or an Azure SDK beta package that provides the syntax, configure a skillset to use an identity: The managed identity used on the connection belongs to the search service. The identity can be system-managed or user-assigned. The identity must haveCognitive Services Userpermissions on the Azure AI resource. @odata.typeis always#Microsoft.Azure.Search.AIServicesByIdentity. subdomainUrlis the endpoint of your Azure AI multi-service resource. It can be inany region that's jointly supportedby Azure AI Search and Azure AI services. As with keys, the details you provide about the Azure AI Services resource are used for billing, not connections. All API requests made by Azure AI Search to Azure AI services for built-in skills processing continue to be internal and managed by Microsoft. Configuration JSON Example: System-managed identity Below is an example JSON for configuring a system-managed identity with a skillset. Identity is set to null. POST https://[service-name].search.windows.net/skillsets/[skillset-name]?api-version=2024-11-01-Preview { "name": "my skillset name", "skills": [ // skills definition goes here ], "cognitiveServices": { "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity", "description": "", "subdomainUrl": “https://[subdomain-name].cognitiveservices.azure.com", "identity": null } } User-assigned managed identity Below is an example JSON for configuring a user-assigned managed identity with a skillset. Identity is set to the resource ID of the user-assigned managed identity. To find an existing user-assigned managed identity, seeManage user-assigned managed identities. For a user-assigned managed identity, set the@odata.typeand theuserAssignedIdentityproperties. POST https://[service-name].search.windows.net/skillsets/[skillset-name]?api-version=2024-11-01-Preview { "name": "my skillset name", "skills": [ // skills definition goes here ], "cognitiveServices": { "@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity", "description": "", "subdomainUrl": “https://[subdomain-name].cognitiveservices.azure.com", "identity": { "@odata.type": "#Microsoft.Azure.Search.DataUserAssignedIdentity", "userAssignedIdentity": ""/subscriptions/{subscription-ID}/resourceGroups/{resource-group-name}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{user-assigned-managed-identity-name}"" } } } Exploring Additional Preview Features in Azure AI Search Azure AI Search’s latest updates extend its support for RAG applications even further. Read our recently released blogs, discussing also additional features: Query rewriting RAG for GitHub models, powered by Azure AI Search. What’s New in Azure AI Search? Additional Resources What is RAG? Skill context and input annotation reference language Azure AI Search pricing AI Document Intelligence layout model pricing AI enrichment concepts Vector GitHub repo3.4KViews0likes0CommentsTransforming Video into Value with Azure AI Content Understanding
Unlocking Value from Unstructured Video Every minute, social video sharing platforms see over 500 hours of video uploads [1] and 91% of businesses leverage video as a key tool[2]. From media conglomerates managing extensive archives to enterprises producing training and marketing materials, organizations are overwhelmed with video. Yet, despite this abundance, video remains inherently unstructured and difficult to utilize effectively. While the volume of video content continues to grow exponentially, its true value often remains untapped due to the friction involved in making video useful. Organizations grapple with several pain points: Inaccessibility of Valuable Content Archives: Massive video archives sit idle because finding the right content to reuse requires extensive manual effort. The Impossibility of Personalization Without Metadata: Personalization holds the key to unlocking new revenue streams and increasing engagement. However, without reliable and detailed metadata, it's cost-prohibitive to tailor content to specific audiences or individuals. Missed Monetization Opportunities: For media companies, untapped archives mean missed chances to monetize content through new formats or platforms. Operational Bottlenecks: Enterprises struggle with slow turnaround times for training materials, compliance checks, and marketing campaigns due to inefficient video workflows, leading to delays and increased expenses. Many video processing application rely on purpose-built, frame-by-frame analysis to identify objects and key elements within video content. While this method can detect a specific list of objects, it is inherently lossy, struggling to capture actions, events, or uncommon objects. It also is expensive and time consuming to customize for specific tasks. Generative AI promises to revolutionize video content analysis, with GPT-4o topping leaderboards for video understanding tasks, but finding a generative model that processes video is just the first step. Creating video pipelines with generative models is hard. Developers must invest significant effort in infrastructure to create custom video processing pipelines to get good results. These systems need optimized prompts, integrated transcription, smart handling of context-window limitations, shot aligned segmentation, and much more. This makes them expensive to optimize and hard to maintain over time. Introducing Azure AI Content Understanding for video This is where Azure AI Content Understanding transforms the game. By offering an integrated video pipeline that leverages advanced foundational models, you can effortlessly extract insights from both the audio and visual elements of your videos. This service transforms unstructured video into structured, searchable knowledge, enabling powerful use cases like media asset management and highlight reel generation. With Content Understanding, you can automatically identify key moments in a video to extract highlights and summarize the full context. For example, for corporate events and conferences you can quickly produce same-day highlight reels. This capability not only reduces the time and cost associated with manual editing but also empowers organizations to deliver timely, professional reaction videos that keep audiences engaged and informed. In another case, A news broadcaster can create a new personalized viewing experience for news by recommending stories of interest. This is achieved by automatically tagging segments with relevant metadata like topic and location, enabling the delivery of content personalized to individual interests, driving higher engagement and viewer satisfaction. By generating specific metadata on a segment-by-segment basis, including chapters, scenes, and shots, Content Understanding provides a detailed outline of what's contained in the video, facilitating these workflows. This is enabled by a streamlined pipeline for video that starts with content extraction tasks like transcription, shot detection, key frame extraction, and face grouping to create grounding data for analysis. Then, generative models use that information to extract the specific fields you request for each segment of the video. This generative field extraction capability enables customers to: Customize Metadata: Tailor the extracted information to focus on elements important to your use case, such as key events, actions, or dialogues. Create Detailed Outlines: Understand the structure of your video content at a granular level. Automate Repetitive Editing Tasks: Quickly pinpoint important segments to create summaries, trailers, or compilations that capture the essence of the full video. By leveraging these capabilities, organizations can automate many video creation tasks including creating highlight reels and repurposing content across formats, saving time and resources while delivering compelling content to their audiences. Whether it's summarizing conference keynotes, capturing the essence of corporate events, or showcasing the most exciting moments in sports, Azure AI Content Understanding makes video workflows efficient and scalable. But how do these solutions perform in real-world scenarios? Customer Success Stories IPV Curator: Transforming Media Asset Management IPV Curator, a leader in media asset management solutions, assists clients in managing and monetizing extensive video libraries across various industries, including broadcast, sports, and global enterprises. It enables seamless, zero-download editing of video in Azure cloud using Adobe applications. Their customers needed an efficient way to search, repurpose, and produce vast amounts of video content with data extraction tailored to specific use cases. IPV integratedAzure AI Content Understandinginto their Curator media asset management platform. They found that it provided a step-function improvement in metadata extraction for their clients. It was particularly beneficial as it enabled: Industry Specific Metadata: Allowed clients to extract metadata tailored to their specific needs by using simple prompts and without the need for domain-specific training of new AI models. For example: Broadcast: Rapidly identified key scenes for promo production and to efficiently identify their highest value content for Free ad-supported streaming TV (FAST) channels. Travel Marketing Content: Automatically tagged geographic locations, landmarks, shot types (e.g., aerial, close-up), and highlighted scenic details. Shopping Channel Content: Detected specific products, identified demo segments, product categories, and key selling points. Advanced Action and Event Analysis: Enabled detailed analysis of a set of frames in a video segment to identify actions and events. This provides a new level of insights compared to frame-by-frame analysis of objects. Segmentation Aligned to Shots: Detected shot boundaries in produced videos and in-media edit points, enabling easy reuse by capturing full shots in segments. As a result, IPV's clients can quickly find and repurpose content, significantly reducing editing time and accelerating video production at scale. IPV Curator enables search across industry specific metadata extracted from videos "IPV's collaboration with Microsoft transforms media stored in Azure into an easily accessible, streaming, and highly searchable active archive. The powerful search engine within IPV's new generation of Media Asset Management uses Azure AI Content Understanding to accurately surface any archived video clip, driving users to their highest value content in seconds." —Daniel Mathew, Chief Revenue Officer, IPV Cognizant: Innovative Ad Moderation Cognizant, a global leader in consulting and professional services, has identified a challenge of moderating advertising content for its media customers. Their customers' traditional methods are heavily reliant on manual review and struggling to scale with the increasing volume of content requiring assessment. The Cognizant Ad Moderation solution framework leverages Content Understanding to create a more accurate, cost-effective approach to ad moderation that results in a 96% reduction in review time. It allows customers to automate ad reviews to ensure cultural sensitivity, regulatory compliance, and optimizing programming placement, ultimately reducing manual review efforts. Cognizant achieves these results by leveraging Content Understanding for multimodal field extraction, tailored output, and native generative AI video processing. Multimodal Field Extraction: Extracts key attributes from both the audio and visual elements, allowing for a more comprehensive analysis of the content. This analysis is critical to get a holistic view of suitability for various audiences. Tailored Output Schema: Outputs a custom structured schema that detects content directly relevant to the moderation task. This includes detecting specific risky attributes like prohibited language, potentially banned topics, violations of content restrictions, and sensitive products like alcohol or smoking. Native Generative AI Video Processing: Content Understanding natively processes video files with generative AI to provide the detailed insights requested in the schema capturing context, actions, and events over entire segments of the video. This optimized video pipeline provides Cognizant with a detailed analysis of videos to ground an automated decision. It allows them to quickly green light compliant ads and flag others for rejection or human review. Content Understanding empowers Cognizant to focus on solving business challenges rather than managing the underlying infrastructure for video processing and integrating generative models. “I'm absolutely thrilled about the Azure AI Content Understanding service! It's a game-changer that accelerates processing by integrating multiple AI capabilities into a single service call, delivering combined audio and video transcription in one JSON output with incredibly detailed results. The ability to add custom fields that integrate with an LLM provides even more detailed, meaningful, and flexible output.” - Rushil Patel – Developer @ Cognizant The Broader Impact: Transformation across industries The transformative power of Azure AI Content Understanding extends far beyond these specific use cases, offering significant benefits across various industries and workflows. By leveraging advanced AI capabilities on video, organizations have been able to unlock new opportunities and drive innovation in several key areas: Social Media Listening and Consumer Insights: Analyze video content across social platforms to understand how products are perceived and discussed online. Gain valuable consumer insights to inform product development, marketing strategies, and brand management. Unlocking Video for AI Assistants and Agents: Enable AI assistants and agents to access and utilize information from video content, transforming meeting recordings, training videos, and events into valuable data sources for Retrieval-Augmented Generation (RAG). Enhance customer support and knowledge management by integrating video insights into AI-driven interactions. Enhancing Accessibility with Audio Descriptions: Generate draft audio descriptions for video content to provide a starting point for human editors. This streamlines the creation of accessible content for visually impaired audiences, reducing effort and accelerating compliance with accessibility standards. Marketing and Advertising Workflows: Automate content analysis to ensure brand alignment and effective advertising. Understand and optimize the content within video advertisements to maintain consistent branding and enhance audience engagement. The business value of Azure AI Content Understanding is clear. By addressing core challenges in video content management with generative AI, customization, and native video processing, it enhances operational efficiencies and unlocks new opportunities for monetization and innovation. Organizations can now turn dormant video archives into valuable assets, deliver personalized content to engage audiences effectively, and automate manual time-consuming workflows. Ready to Transform Your Video Content? For more details on how to use Content Understanding for video check out theVideo Solution Overview. If you are at Microsoft Ignite 2024 or are watching online, check out thisbreakout session. Try this new service in Azure AI Foundry. For documentation, please refer to the Content Understanding Overview For a broader perspective, seeAnnouncing Azure AI Content Understanding: Transforming Multimodal Data into Insightsand discover how it extends these capabilities across all content formats. ----- [1] According to Statistia in 2022 - Hours of video uploaded every minute 2022 | Statista [2] According to a Wyzowl survey in 2024 - Video Marketing 2024 (10 Years of Data) | Wyzowl3.1KViews0likes0CommentsAnnouncing Azure AI Content Understanding: Transforming Multimodal Data into Insights
Solve Common GenAI Challenges with Content Understanding As enterprises leverage foundation models to extract insights from multimodal data and develop agentic workflows for automation, it's common to encounter issues like inconsistent output quality, ineffective pre-processing, and difficulties in scaling out the solution. Organizations often find that to handle multiple types of data, the effort is fragmented by modality, increasing the complexity of getting started. Azure AI Content Understanding is designed to eliminate these barriers, accelerating success in Generative AI workflows. Handling Diverse Data Formats: By providing a unified service for ingesting and transforming data of different modalities, businesses can extract insights from documents, images, videos, and audio seamlessly and simultaneously, streamlining workflows for enterprises. Improving Output Data Accuracy: Deriving high-quality output for their use-cases requires practitioners to ensure the underlying AI is customized to their needs. Using advanced AI techniques like intent clarification, and a strongly typed schema, Content Understanding can effectively parse large files to extract values accurately. Reducing Costs and Accelerating Time-to-Value: Using confidence scores to trigger human review only when needed minimizes the total cost of processing the content. Integrating the different modalities into a unified workflow and grounding the content when applicable allows for faster reviews. Core Features and Advantages Azure AI Content Understanding offers a range of innovative capabilities that improve efficiency, accuracy, and scalability, enabling businesses to unlock deeper value from their content and deliver a superior experience to their end users. Multimodal Data Ingestion and Content Extraction: The service ingests a variety of data types such as documents, images, audio, and video, transforming them into a structured format that can be easily processed and analyzed. It instantly extracts core content from your data including transcriptions, text, faces, and more. Data Enrichment: Content Understanding offers additional features that enhance content extraction results, such as layout elements, barcodes, and figures in documents, speaker recognition and diarization in audio, and more. Schema Inferencing: The service offers a set of prebuilt schemas and allows you to build and customize your own to extract exactly what you need from your data. Schemas allow you to extract a variety of results, generating task-specific representations like captions, transcripts, summaries, thumbnails, and highlights. This output can be consumed by downstream applications for advanced reasoning and automation. Post Processing: Enhances service capabilities with generative AI tools that ensure the accuracy and usability of extracted information. This includes providing confidence scores for minimal human intervention and enabling continuous improvement through user feedback. Transformative Applications Across Industries Azure AI Content Understanding is ideal for a wide range of use cases and industries, as it is fully customizable and allows for the input of data from multiple modalities. Here are just a few examples of scenarios Content Understanding is powering today: Post call analytics:Customers utilize Azure AI Content Understanding to extract analytics on call center or recorded meeting data, allowing you to aggregate data on the sentiment, speakers, and content discussed, including specific names, companies, user data, and more. Media asset management and content creation assistance: Extract key features from images and videos to better manage media assets and enable search on your data for entities like brands, setting, key products, people, and more. Insurance claims: Analyze and process insurance claims and other low-latency batch processing scenarios to automate previously time-intensive processes. Highlight video reel generation:With Content Understanding, you can automatically identify key moments in a video to extract highlights and summarize the full content. For example, automatically generate a first draft of highlight reels from conferences, seminars, or corporate events by identifying key moments and significant announcements. Retrieval Augmented Generation (RAG): Ingest and enrich content of any modality to effectively find answers to common questions in scenarios like customer service agents, or power content search scenarios across all types of data. Customer Success with Content Understanding Customers all over the world are already finding unique and powerful ways to accelerate their inferencing and unlock insights on their data by leveraging the multi modal capabilities of Content Understanding. Here are a few examples of how customers are unlocking greater value from their data: Philips: Philips Speech Processing Solutions (SPS) is a global leader in dictation and speech-to-text solutions, offering innovative hardware and software products that enhance productivity and efficiency for professionals worldwide. Content Understanding enables Philips to power their speech-to-result solution, allowing customers to use voice to generate accurate, ready-to-use documentation. “With Azure AI Content Understanding, we're taking Philips SpeechLive, our speech-to-result solution to a whole new level. Imagine speaking, and getting fully generated, accurate documents—ready to use right away, thanks to powerful AI speech analytics that work seamlessly with all the relevant data sources.” – Thomas Wagner, CTO Philips Dictation Services WPP:WPP, one of the world’s largest advertising and marketing services providers, is revolutionizing website experiences using Azure AI Content Understanding. SJR, a content tech firm within WPP, is leveraging this technology for SJR Generative Experience Manager (GXM) which extracts data from all types of media on a company's website—including text, audio, video, PDFs, and images—to deliver intelligent, interactive, and personalized web experiences, with the support of WPP's AI technology company, Satalia. This enables them to convert static websites into dynamic, conversational interfaces, unlocking information buried deep within websites and presenting it as if spoken by the company's most knowledgeable salesperson. Through this innovation, WPP's SJR is enhancing customer engagement and driving conversion for their clients. ASC: ASC Technologies is a global leader in providing software and cloud solutions for omni-channel recording, quality management, and analytics, catering to industries such as contact centers, financial services, and public safety organizations. ASC utilizes Content Understanding to enhance their compliance analytics solution, streamlining processes and improving efficiency. "ASC expects to significantly reduce the time-to-market for its compliance analytics solutions. By integrating all the required capture modalities into one request, instead of customizing and maintaining various APIs and formats, we can cover a wide range of use cases in a much shorter time.” - Tobias Fengler, Chief Engineering Officer Numonix: Numonix AI specializes in capturing, analyzing, and managing customer interactions across various communication channels, helping organizations enhance customer experiences and ensure regulatory compliance. They are leveraging Content Understanding to capture insights from recorded call data from both audio and video to transcribe, analyze, and summarize the contents of calls and meetings, allowing them to ensure compliance across all conversations. “Leveraging Azure AI Content Understanding across multiple modalities has allowed us to supercharge the value of the recorded data Numonix captures on behalf of our customers. Enabling smarter communication compliance and security in the financial industry to fully automating quality management in the world’s largest call centers.” – Evan Kahan, CTO & CPO Numonix IPV Curator: A leader in media asset management solutions, IPV is leveraging Content Understanding to improve their metadata extraction capabilities to produce stronger industry specific metadata, advanced action and event analysis, and align video segmentation to specific shots in videos. IPV’s clients are now able to accelerate their video production, reduce editing time, access their content more quickly and easily. To learn more about how Content Understanding empowers video scenarios as well as how our customers such as IPV are using the service to power their unique media applications, check out Transforming Video Content into Business Value. Robust Security and Compliance Built using Azure’s industry-leading enterprise security, data privacy, and Responsible AI guidelines, Azure AI Content Understanding ensures that your data is handled with the utmost care and compliance and generates responses that align with Microsoft’s principles for responsible use of AI. We are excited to see how Azure AI Content Understanding will empower organizations to unlock their data's full potential, driving efficiency and innovation across various industries. Stay tuned as we continue to develop and enhance this groundbreaking service. Getting Started If you are at Microsoft Ignite 2024 or are watching online, check out this breakout session on Content Understanding. Learn more about the new Azure AI Content Understanding service here. Build your own Content Understanding solution in the Azure AI Foundry. For all documentation on Content Understanding, please refer to this page.3.9KViews1like0CommentsSix reasons why startups and at-scale cloud native companies build their GenAI Apps with Azure
Azure has evolved as a platform of choice for many startups including Perplexity and Moveworks, as well as at-scale companies today. Here are six reasons why we see companies of all sizes building their GenAI apps on Azure OpenAI Service.3KViews2likes0CommentsBuild Intelligent RAG For Multimodality and Complex Document Structure
Struggling with implementing RAG for Document with multiples Tables, Figures , Plots with complex structured scan documents, Looking for a solution? In this blog you will learn implementing end to end solution using Azure AI services including Document Intelligence, AI Search, Azure Open AI and our favorite LangChain 🙂 I have taken the advantage of Multimodal capability of Azure OpenAI GPT-4-Vision model for Figures and Plots.15KViews8likes5Comments