What happens when a Data Scientist combines complex neural networks with classic ML (machine learning) models? Spoiler – the results are super interesting and fashionable! But let's not jump ahead and talk about clothes in videos.
Videos are complex creatures. They include thousands of images (frames), audio with voice and special events such as clapping and laughter, and text (the transcript). Understanding key insights from videos is a super complex task, that requires smart AI (artificial intelligence) models.
Clothing plays a key role in many kinds of videos from the aspects of fashion, content, advertisement and more. To derive the clothing featured in a video, we first need to detect the people in the video, which is a complex task in AI. However, not all clothing has the same influence in a video compared to others. Since there can be many people observed in a video, how can we conclude what are the most influential clothing? The main characters’ clothing will probably be in the focus of the camera and be more relevant to the video’s content than the clothing of background characters. However, secondary characters can also have major influence if they take part in a key moment or are a celebrity guest.
This leads to the key idea of ranking clothing in a video - by the importance of the characters. How can we create an algorithm that decides who are the main characters, in terms of clothing?
When watching a movie or a TV series, we can understand from the content of the video who the main characters are. For example, they can appear for a long time in the video, be in the focus of the camera, wear branded clothing or even be celebrities. These are only a few examples of features that define the main characters. Our new model relies on these features and more to address the main characters in the video and rank their clothing by importance of the characters.
How does it work? Classic machine learning on top of advanced AI models
Featured Clothing is based on advanced AI models developed in Video Indexer in the past 5 years, where the fundamental model for this new algorithm is detecting and tracking people in videos. One can assume that time of appearance is a key feature for finding the main characters, however it’s not enough. A celebrity guest, or a person appearing in a key moment in the video, can also affect the importance of the person and their clothing. Therefore, we leverage the insights created by other AI models in Video Indexer. For example, celebrity recognition is a key feature based on a complex AI pipeline. In order to know that a person is a celebrity, we use the face pipeline in Video Indexer, which detects faces, groups them by people, and apply celebrity recognition model. We use a recently developed algorithm to match the group of the face with the observed person, and from that deduce if the observed person is indeed a celebrity.
How can we detect key moments? For that we also leverage two AI models developed in our team. We use Audio Events Detection, a complex neural network that detects special events such as laughter and crowd reaction. We combine these results with emotions that appear in the video, based on a neural network that detects expressed emotions in the video.
So now that we have a bucket full of amazing features, all based on clever AI models, how can we combine them all together to smartly detect the main characters and their clothing? This is where classic ML comes to place. We use the classic regression model to combine them all and give a score! The results are amazing, showing the main characters + their best frame, allowing a cascade of applications from video summarization to tailored commercials.
The video indexer team are releasing Featured Clothing capability in Public Preview (to be released by the end of Aug 2022).
We would like to detect the featured clothing and rank them by importance in the video . First, upload and index your video to Video Indexer according to the guidelines. Use the option for advanced video and audio indexing.
Next, download the artifacts of the video using the download buttons on Video Indexer’s portal, as marked in the following screenshot:
The artifacts hold the results of Featured Clothing in the zip named featuredclothing.zip. The results contain two objects:
featuredclothing.map.json. This file contains the instances of each featured clothing, with the following fields:
id – ranking index (id=1 is the most important clothing)
confidence – the score of the featured clothing
frameIndex – the best frame of the clothing
timestamp – corresponding to the frameIndex
opBoundingBox – bounding box of the person
faceBoundingBox – bounding box of the person’s face, if detected
fileName – where the best frame of the clothing is saved
featuredclothing.frames.map. This folder contains images of the best frames that the featured clothing appeared in, corresponding to the field fileName in each instance in featuredclothing.map.json.
In this example, the algorithm ranks the clothing in the video by the importance of the characters, as expected. For example, the clothing of celebrities like Kendrick Lamar and Rihanna are given high scores even though they don't appear for a long time, since their appearances arein the main interest of the video and a part of the main content, while the clothing of the host is given a lower score even though she appears for a long time in the video.
Kendrick Lamar. Rank 1, confidence 0.93
Rihanna. Rank 2, confidence 0.9
The host. Rank 18, confidence 0.44
Join us and share your feedback
For those of you who are new to our technology, we encourage you to get started today with these helpful resources: