UPDATE June 24: The code for applying the video search over Video Indexer Prompt Content is now available at azure-video-indexer-samples/VideoQnA-Demo/README.md at master · Azure-Samples/azure-video-indexer-sa....
Have you ever watched an online course and wished you could ask questions on the entire course, or have a comprehensive summary of a video? This can all now be achieved with Azure AI Video Indexer and LLMs (Large Language Model) – powering each other.
LLMs are powerful language models that can capture the essence of text, allow natural language question-answering and much more. In Azure AI Video Indexer, we understand videos – video content is more than just words, and a single shot can contain a wealth of insights that are critical for its understanding. Coupling these two powerful tools can lead to great results in video understanding and downstream tasks in natural language.
Our new API extracts and processes all the multi-modality insights of a video into prompt-ready format, that can be easily used with LLMs.
This API can be used on already-indexed videos in Azure AI Video Indexer, so there is no need to index videos again to create the prompt-ready format of the videos.
Azure AI Video Indexer has a new algorithm that translates the multi-modality content understanding into an LLM’s prompt-ready format, capturing the important details and insights in the video, which then can be used as-is to create LLM prompts for various tasks, including video summarization or search.
Figure 1- Our algorithm flow. Starting by capturing the important details and insights in the video using Azure AI Video Indexer (A, B), then splitting the video and its insights into coherent chapters based on visual context (C), and further splitting the sections based on other insights given the LLM prompt’s limitations (D).
Our new algorithm is based on advanced AI models developed in Azure AI Video Indexer. It effectively integrates all three modalities – visual, audio and text – based on the main insights from Azure AI Video Indexer, processes them and transforms them into an LLM’s prompt-ready format. The method consists of the following steps:
The final output is shown in Figure 3, and we call it the Prompt Content Json file.
Overall, our method combines most of all modalities to provide an effective approach to analyze videos’ content with LLMs. We show examples of two use-cases: Video-to-text summarization and searching within the video content. To exemplify the abilities of combining our prompt content with an LLM, we use videos from AKS Edge Essentials tutorials series.
Figure 2 – An example of a section’s content, created with our new algorithm and shown in the output Json file from our new API (Figure 3).
Figure 3 – An example of the output Json file from our new API.
Figure 4 – The flow from video to downstream tasks, starting with a video, extracting its insights with Azure AI Video Indexer and creating prompt content with our new algorithm. The prompt content can be used with any LLM, such as GPT, for tasks such as summarization and question-answering.
Video is more than just words. Today’s methods for video summarization rely mostly on the audio part of videos (transcript). Our approach enriches this information with visual and audio data. By combining these three modalities we can gain a better understanding of the video’s content, hence the potential for a better summarization that captures the essence of the video. We use a simple iterative algorithm for summarization that enables us to summarize long videos with rich content. First, we use our Prompt Content for chaptering and creating prompt-ready sections of the video, where each section includes its matching textual, audio and visual insights (as explained above). Then, at each iteration, we summarize the current section and the cumulative summary of the previous sections (given as additional information in the prompt). We can also control the summary “flavor” – whether we want to create an informative summary, a teaser and much more, by enhancing only the prompt of the final section summary.
Figure 5 – The video summarization flow. We start by using our new API to create the prompt content, that is divided into sections. Then, we use a simple iterative method for summarization using LLM.
For the summarization example, we will use a video that describes the storage architecture of AKS Edge Essentials. It also demonstrates how to create storage classes with local path and NFS, how to create a PVC, and how to test them on Linux.
Informative summary example – made with our prompt content and Chat GPT
In this video from AKS Edge Essentials, Lior and a guest demonstrate how to use local path and NFS storage classes with AKS Edge Essentials. The guest demonstrates how to create a Persistent Volume Claim (PVC) using the kubectl apply command and tests it with the kubectl exec command. He also shows how to use the Invoke-AksEdgeNodeCommand feature to interact with the node’s operating system and create a local-path-pvc. They explain the use case for using the NFS storage provisioner for multi-node clusters and how to mount the PVC using the deployment YAML file to store data. In conclusion, they explain how to use the volume-test command to view the PVC and demonstrate how to use it for application development.
As video is more than just words, and specifically in this video, an important part is the live demo showing useful commands. The highlighted sentences came from the visual only.
Teaser summary example – made with our prompt content and Chat GPT
Check out the latest AKS Edge Essentials video with Lior to learn how to deploy an NFS storage class and use it to store data. Follow along as they demonstrate how to set up persistent storage for applications using persistent volumes, persistent volume claims, and storage classes. Don’t miss this informative and educational video!
UPDATE June 2024: The code for creating this capability over Video Indexer Prompt Content is now available in azure-video-indexer-samples/VideoQnA-Demo/README.md at master · Azure-Samples/azure-video-indexer-sa...
Video archives can be huge and not easy to search. We aim to simplify video archive searching and allow question-answering by combining our Prompt Content file with a document archive search technique. This technique typically involves an embedding model, vector store, search engine, and NLP generative model. For video archive, instead of searched “documents”, we search through sections of the videos, created using our Prompt Content algorithm. Here is how we do it for video archives:
This approach simplifies the process of searching within video archive and finding answers by leveraging well-established techniques from document searching and LLMs.
Figure 6 – The search flow, using Azure AI Video Indexer to create the prompt content, Azure AI Search for the search process and LLMs for embeddings and generating a textual answer based on the top search results.
Question-answering examples - AKS Edge Essentials video archive
Question: What is the PowerShell command to create a JSON template for AKS edge deployment?
Answer: The command is 'New-AksEdgeConfig -DeploymentType ScalableCluster -NodeType LinuxAndWindows' 1.
Citations: 1. AKS Edge Essentials scheme and Azure Arc (0:03:30)
Question: Please summarize the difference between single and multi-machine cluster?
Answer: Single machine cluster uses internal virtual switch network architecture and all the IP addresses are defined 12. Scalable cluster uses external virtual switch network architecture and all the IP addresses need to be set up as part of the deployment 12. When using single machine cluster, NAT is used, whereas scalable cluster connects directly to a physical network 2. The DNS and proxy settings are the same for both 2. Multi-machine cluster is used for industrial scenarios where one device can be used as a backup if another device goes down 3.
Citations: 1. AKS Edge Essentials network architecture (0:08:31) 2. AKS Edge Essentials network architecture (0:10:30) 3. AKS Edge Essentials Multi Node (0:05:40)
Check out our demonstrations for the downstream tasks using our new algorithm and API:
See also Azure AI Search documentation | Microsoft Learn
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.