We are thrilled to introduce textual video summarization for recorded video and audio files, powered by large and small language models (LLM and SLM).
AI application developers can leverage APIs to create textual summaries for audio and video files, anywhere.
Data analysts, instead of watching entire videos, can benefit from concise summaries of video and audio content and adjust it to their needs.
Azure AI Video Indexer, a cloud and edge video solution, enables textual video summarization with the following build announcements:
The feature of textual video summarization in Azure AI Video Indexer, cloud edition is powered by Azure Open AI. This innovative addition allows customers who have created an AOAI resource in Azure, to seamlessly integrate it with Video Indexer. By leveraging deployments such as GPT4, users can now enjoy concise textual summaries of their videos, presented as an insightful extract alongside the player page. The video summary not only enhances the viewing experience but also empowers video analysts to tailor the summary’s nuances and to align with specific business requirements.
The summary that encapsulates the essence of the video content, utilizing not only the transcript but also additional elements derived from the visual and audio aspects of the video like a siren and crowd reactions in the background, or any visual text that appear on the screen like signs, text, visual objects and more.
The preview version of Azure AI Video Indexer enabled by Arc now includes integration with SLM through Phi3. The innovation containerizes both the Azure AI and Phi3 models, providing video analysts the ability to perform video summarization. It represents a significant stride in our generative AI capabilities utilizing the cutting-edge Phi3 model at the edge. The Phi3 model opens new avenues for AI applications, especially in settings where computing resources are limited, by offering a more streamlined and efficient approach to video analysis.
The Phi3 model, developed in line with Microsoft’s Responsible AI principles and trained on high-quality data, is a testament to our dedication to safety and excellence in AI. It’s a lightweight, state-of-the-art model designed for long-context support, making it ideal for generating responsive and relevant text in chat formats.
Watch the demo recording to learn more:
Video analysts utilizing the summarization feature will appreciate the added flexibility of feature customization. Tailor your summaries to meet specific needs with selectable options such as “Shorter” for concise overviews, “Longer” for detailed accounts, “Formal” for professional contexts, and “Casual” for a more relaxed tone. This personalized approach ensures that your summaries align perfectly with your intended audience and purpose.
Use Textual Video Summarization in Your Public Cloud Environment:
If you already have an existing Azure Video Indexer account, follow these steps to use the video summarization:
For detailed instructions on how to set up this integration, refer to this guidance . Please note that this feature is not available in Video Indexer trial accounts or on legacy accounts which uses Azure Media services. Leverage this opportunity also to remove your dependency on Azure Media services by following these instructions.
Use Textual Video Summarization in Your Edge Environment, enabled by Arc:
If your edge appliances are integrated with the Azure Platform via Azure Arc, you’re in for a treat! Here’s how to activate the feature:
The prompt content API, that converts video to text based on video Indexer’s extracted insights, now supports additional models: Llama, Phi2 and GPTv4. It provides more flexibility when converting video content to text. To learn more about this API, refer to this API documentation.
About the feature
About Azure AI Video Indexer
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.