Azure AI Content Understanding now provides advanced video capabilities, transforming unstructured video into structured, searchable knowledge. This empowers businesses to automate video processing tasks, extract valuable insights, and maximize the return on video investments, with lower developer overhead and no need for extensive video processing code.
Unlocking Value from Unstructured Video
Every minute, social video sharing platforms see over 500 hours of video uploads [1] and 91% of businesses leverage video as a key tool[2]. From media conglomerates managing extensive archives to enterprises producing training and marketing materials, organizations are overwhelmed with video. Yet, despite this abundance, video remains inherently unstructured and difficult to utilize effectively.
While the volume of video content continues to grow exponentially, its true value often remains untapped due to the friction involved in making video useful. Organizations grapple with several pain points:
- Inaccessibility of Valuable Content Archives: Massive video archives sit idle because finding the right content to reuse requires extensive manual effort.
- The Impossibility of Personalization Without Metadata: Personalization holds the key to unlocking new revenue streams and increasing engagement. However, without reliable and detailed metadata, it's cost-prohibitive to tailor content to specific audiences or individuals.
- Missed Monetization Opportunities: For media companies, untapped archives mean missed chances to monetize content through new formats or platforms.
- Operational Bottlenecks: Enterprises struggle with slow turnaround times for training materials, compliance checks, and marketing campaigns due to inefficient video workflows, leading to delays and increased expenses.
Many video processing application rely on purpose-built, frame-by-frame analysis to identify objects and key elements within video content. While this method can detect a specific list of objects, it is inherently lossy, struggling to capture actions, events, or uncommon objects. It also is expensive and time consuming to customize for specific tasks.
Generative AI promises to revolutionize video content analysis, with GPT-4o topping leaderboards for video understanding tasks, but finding a generative model that processes video is just the first step. Creating video pipelines with generative models is hard. Developers must invest significant effort in infrastructure to create custom video processing pipelines to get good results. These systems need optimized prompts, integrated transcription, smart handling of context-window limitations, shot aligned segmentation, and much more. This makes them expensive to optimize and hard to maintain over time.
Introducing Azure AI Content Understanding for video
This is where Azure AI Content Understanding transforms the game. By offering an integrated video pipeline that leverages advanced foundational models, you can effortlessly extract insights from both the audio and visual elements of your videos. This service transforms unstructured video into structured, searchable knowledge, enabling powerful use cases like media asset management and highlight reel generation.
With Content Understanding, you can automatically identify key moments in a video to extract highlights and summarize the full context. For example, for corporate events and conferences you can quickly produce same-day highlight reels. This capability not only reduces the time and cost associated with manual editing but also empowers organizations to deliver timely, professional reaction videos that keep audiences engaged and informed.
In another case, A news broadcaster can create a new personalized viewing experience for news by recommending stories of interest. This is achieved by automatically tagging segments with relevant metadata like topic and location, enabling the delivery of content personalized to individual interests, driving higher engagement and viewer satisfaction.
By generating specific metadata on a segment-by-segment basis, including chapters, scenes, and shots, Content Understanding provides a detailed outline of what's contained in the video, facilitating these workflows.
This is enabled by a streamlined pipeline for video that starts with content extraction tasks like transcription, shot detection, key frame extraction, and face grouping to create grounding data for analysis. Then, generative models use that information to extract the specific fields you request for each segment of the video. This generative field extraction capability enables customers to:
- Customize Metadata: Tailor the extracted information to focus on elements important to your use case, such as key events, actions, or dialogues.
- Create Detailed Outlines: Understand the structure of your video content at a granular level.
- Automate Repetitive Editing Tasks: Quickly pinpoint important segments to create summaries, trailers, or compilations that capture the essence of the full video.
By leveraging these capabilities, organizations can automate many video creation tasks including creating highlight reels and repurposing content across formats, saving time and resources while delivering compelling content to their audiences. Whether it's summarizing conference keynotes, capturing the essence of corporate events, or showcasing the most exciting moments in sports, Azure AI Content Understanding makes video workflows efficient and scalable. But how do these solutions perform in real-world scenarios?
Customer Success Stories
IPV Curator: Transforming Media Asset Management
IPV Curator, a leader in media asset management solutions, assists clients in managing and monetizing extensive video libraries across various industries, including broadcast, sports, and global enterprises. It enables seamless, zero-download editing of video in Azure cloud using Adobe applications. Their customers needed an efficient way to search, repurpose, and produce vast amounts of video content with data extraction tailored to specific use cases.
IPV integrated Azure AI Content Understanding into their Curator media asset management platform. They found that it provided a step-function improvement in metadata extraction for their clients. It was particularly beneficial as it enabled:
- Industry Specific Metadata: Allowed clients to extract metadata tailored to their specific needs by using simple prompts and without the need for domain-specific training of new AI models. For example:
- Broadcast: Rapidly identified key scenes for promo production and to efficiently identify their highest value content for Free ad-supported streaming TV (FAST) channels.
- Travel Marketing Content: Automatically tagged geographic locations, landmarks, shot types (e.g., aerial, close-up), and highlighted scenic details.
- Shopping Channel Content: Detected specific products, identified demo segments, product categories, and key selling points.
- Advanced Action and Event Analysis: Enabled detailed analysis of a set of frames in a video segment to identify actions and events. This provides a new level of insights compared to frame-by-frame analysis of objects.
- Segmentation Aligned to Shots: Detected shot boundaries in produced videos and in-media edit points, enabling easy reuse by capturing full shots in segments.
As a result, IPV's clients can quickly find and repurpose content, significantly reducing editing time and accelerating video production at scale.
IPV Curator enables search across industry specific metadata extracted from videos
"IPV's collaboration with Microsoft transforms media stored in Azure into an easily accessible, streaming, and highly searchable active archive. The powerful search engine within IPV's new generation of Media Asset Management uses Azure AI Content Understanding to accurately surface any archived video clip, driving users to their highest value content in seconds."
—Daniel Mathew, Chief Revenue Officer, IPV
Cognizant: Innovative Ad Moderation
Cognizant, a global leader in consulting and professional services, has identified a challenge of moderating advertising content for its media customers. Their customers' traditional methods are heavily reliant on manual review and struggling to scale with the increasing volume of content requiring assessment.
The Cognizant Ad Moderation solution framework leverages Content Understanding to create a more accurate, cost-effective approach to ad moderation that results in a 96% reduction in review time. It allows customers to automate ad reviews to ensure cultural sensitivity, regulatory compliance, and optimizing programming placement, ultimately reducing manual review efforts.
Cognizant achieves these results by leveraging Content Understanding for multimodal field extraction, tailored output, and native generative AI video processing.
- Multimodal Field Extraction: Extracts key attributes from both the audio and visual elements, allowing for a more comprehensive analysis of the content. This analysis is critical to get a holistic view of suitability for various audiences.
- Tailored Output Schema: Outputs a custom structured schema that detects content directly relevant to the moderation task. This includes detecting specific risky attributes like prohibited language, potentially banned topics, violations of content restrictions, and sensitive products like alcohol or smoking.
- Native Generative AI Video Processing: Content Understanding natively processes video files with generative AI to provide the detailed insights requested in the schema capturing context, actions, and events over entire segments of the video.
This optimized video pipeline provides Cognizant with a detailed analysis of videos to ground an automated decision. It allows them to quickly green light compliant ads and flag others for rejection or human review.
Content Understanding empowers Cognizant to focus on solving business challenges rather than managing the underlying infrastructure for video processing and integrating generative models.
“I'm absolutely thrilled about the Azure AI Content Understanding service! It's a game-changer that accelerates processing by integrating multiple AI capabilities into a single service call, delivering combined audio and video transcription in one JSON output with incredibly detailed results. The ability to add custom fields that integrate with an LLM provides even more detailed, meaningful, and flexible output.” - Rushil Patel – Developer @ Cognizant
The Broader Impact: Transformation across industries
The transformative power of Azure AI Content Understanding extends far beyond these specific use cases, offering significant benefits across various industries and workflows. By leveraging advanced AI capabilities on video, organizations have been able to unlock new opportunities and drive innovation in several key areas:
- Social Media Listening and Consumer Insights: Analyze video content across social platforms to understand how products are perceived and discussed online. Gain valuable consumer insights to inform product development, marketing strategies, and brand management.
- Unlocking Video for AI Assistants and Agents: Enable AI assistants and agents to access and utilize information from video content, transforming meeting recordings, training videos, and events into valuable data sources for Retrieval-Augmented Generation (RAG). Enhance customer support and knowledge management by integrating video insights into AI-driven interactions.
- Enhancing Accessibility with Audio Descriptions: Generate draft audio descriptions for video content to provide a starting point for human editors. This streamlines the creation of accessible content for visually impaired audiences, reducing effort and accelerating compliance with accessibility standards.
- Marketing and Advertising Workflows: Automate content analysis to ensure brand alignment and effective advertising. Understand and optimize the content within video advertisements to maintain consistent branding and enhance audience engagement.
The business value of Azure AI Content Understanding is clear. By addressing core challenges in video content management with generative AI, customization, and native video processing, it enhances operational efficiencies and unlocks new opportunities for monetization and innovation. Organizations can now turn dormant video archives into valuable assets, deliver personalized content to engage audiences effectively, and automate manual time-consuming workflows.
Ready to Transform Your Video Content?
- For more details on how to use Content Understanding for video check out the Video Solution Overview.
- If you are at Microsoft Ignite 2024 or are watching online, check out this breakout session.
- Try this new service in Azure AI Foundry.
- For documentation, please refer to the Content Understanding Overview
For a broader perspective, see Announcing Azure AI Content Understanding: Transforming Multimodal Data into Insights and discover how it extends these capabilities across all content formats.
-----
[1] According to Statistia in 2022 - Hours of video uploaded every minute 2022 | Statista
[2] According to a Wyzowl survey in 2024 - Video Marketing 2024 (10 Years of Data) | Wyzowl