New name and a wealth of new capabilities in Video Indexer (now AVA for Media)
Published May 25 2021 07:01 AM 3,239 Views
Microsoft

At MS Build 2021, the Azure Video Indexer service is becoming part of the new set of Applied AI services, aiming to enable developers to accelerate time to value for AI workloads, versus building solutions from scratch. Azure Video Analyzer and Azure Video Analyzer for Media are designed to do that specifically for video AI workloads.  That is, to enable developers to build video AI solutions easily without the need for deep knowledge in Media or in AI/machine learning; from edge to cloud, live to batch.

As part of this change, Video Indexer will be renamed as Azure Video Analyzer for Media (a.k.a. AVAM). Under the new name, we continue to work hard to bring you the insights and capabilities needed to get more out of your cloud media archives; improve searchability, enable new user scenarios and accessibility, and open new monetization opportunities. 

 

So, what else is new in AVAM (other than the name :) )?

 

- New insight types added to provide greater support to analysis, discoverability, and accessibility needs

  • Audio effects detection, with closed caption files enrichment (public preview in trial and paid accounts): ability to detect non-speech audio effects, such as gunshots, explosions, dogs barking, and crowd reactions.
  • Observed people tracing (public preview in trial and paid accounts): ability to detect standing people spotted in the video and trace their path with bounding boxes.

- Improvements to existing insights

  • Improvements to named entities (locations, people, and brands).
  • Improvements to face recognition pipeline.

- Extending global support

  • Expanded regional availability.
  • Multiple new languages supported for transcription.

- Learn from others

  • We are proud to share how our partners WPP and Media Valet use the service to provide better media experiences to their customers.

- New in our developers’ community

  • New developer portal enabling anyone to get started with the API easily and get answers fast.
  • Open-source code to help you leverage the newly added widget customization.
  • Open-source solution to help you add custom search to your video libraries.
  • Open-source solution to help you add de-duplication to your video libraries.

- New “Azure blue” visual theme

 

More about all those great additions and announcements in this blog!

 

Discoverability, accessibility, and event analysis support through new insights 

 

In our journey to allow for a wider and richer analysis for your video archives, we are happy to introduce two new insight types into the AVAM pipelines that can be leveraged in multiple scenarios: Observed people tracing and Audio effect detection. Both new insights are now available in public preview on trial and paid accounts.

 

Observed People Tracing detects people that appear in the video, including the times in the video in which they appeared, and the location of each person in the different video frames (the person’s bounding box). The bounding boxes of the detected people are even displayed in the video while it plays to allow easy tracing of them. Observed People information enables video investigators to easily perform post event analysis of events such as bank robbery or accident at the workplace, as well as to perform trend analysis over time, for example learning how customers move across aisles in a shopping mall or how much time they spend in checkout lines.

 

People observed in the player pagePeople observed in the player page

 

 

Audio effects detection detects and classifies the audio effects in the non-speech segments of the content. Audio effects can be used for discoverability scenarios. For example, finding the set of videos in the archive and specific times within the videos in which a gunshot was detected. It can also be used for accessibility scenarios, to enrich the video transcription with non-speech effects to provide more context for people who are hard of hearing, making content much more accessible to them. This is relevant both for organizational scenarios (e.g., watching a training or keynote session) and in the media and entertainment industry (e.g. watching a movie). The set of audio effects extracted are: Gunshot, Glass shatter, Alarm, Siren, Explosion, Dog Bark, Screaming, Laughter, Crowd reactions (cheering, clapping, and booing) and Silence. The audio effects detected are retrieved as part of the insights JSON and optionally in the closed caption files extracted from the video.

 

Audio effects found in the player pageAudio effects found in the player page

 

The two newly added insights are currently available when indexing a video with “advanced” preset selected, in audio and video analysis respectively. During the preview period there is no additional fee for choosing the advanced preset over the standard one, so it’s a great opportunity to go ahead and try it on your content!

 

We keep improving, constantly.

AVAM at its core provides an out-of-the-box pipeline of a rich set of insights, already fully integrated together. In addition to enriching this pipeline with new insights, we keep looking at the ones that are already there and at how to improve and refine them, to make sure you get the most insight into your media content.

Just recently we released a major improvement to the named entities insights of AVAM. Named entities are locations, people, and brands identified in both the transcription and on-screen text extracted by AVAM, based on natural language processing algorithms. The latest improvement to this insight included identification of a much larger range of people and locations, as well as identification of people and locations in context, even when those are not well-known ones. So, for example, the transcript text “Dan went home” will extract ‘Dan’ as a person and ‘home’ as a location.

Panel of named entities extractedPanel of named entities extracted

We also just released several improvements to the AVAM face detection and recognition pipeline resulting in better accuracy of face recognition, especially when the thumbnail quality isn’t so good.

 

Expanding global support

To enable organizations across the globe to leverage AVAM for their business needs, we are constantly working on expanding the service regional availability as well as the supported languages for transcription.

 

The latest regions we deployed and are now available for customers for creating an AVAM paid account include US North Central, US West, and Canada Central. Additionally, in the next two months we are planning to deploy Central US, France central, Brazil south, West central US, and Korea central.

 

AVAM’s set of supported languages has also expanded and now includes Norwegian, Swedish, Finnish, Canadian French, Thai, multiple Arabic dialects, Turkish, Dutch, Chinese (Cantonese), Czech, and Polish. As everything else in AVAM, the new languages are available for customers using the API and the portal.

 

Learn from others

We are proud to share recent partnership announcements, where Azure Video Analyzer for Media empowers companies to get more out of media content and provide new and exciting capabilities.

 

WPP recently announced a partnership with Microsoft that leverages Azure Video Analyzer for Media to index metadata extracted from their content from a central location accessible from anywhere. The partnership aim is to create an innovative cloud platform that allows creative teams from across WPP’s global network to produce campaigns for clients from any location around the world.

 

Additionally, Media Valet; a leading digital assert management company uses Azure Video Analyzer for Media in their Audio/Video Intelligence tool (AVI) to helps their customers significantly improve asset discoverability and provide greater insight into their audio and video assets through automated metadata tagging. “With Azure Video Analyzer for Media, we deliver more ways for our customers to analyze their assets,” says Lozano. “They can isolate standard and cognitive metadata, find assets quickly—even within a library of 6 million assets, for example—and then home in on specific insights within those assets.”

 

New in the AVAM developer community

To use AVAM at scale, automate processes, and integrate with organizational applications and infrastructure, organizations use the AVAM’s REST API. To help you get started with the API easily and get support for intuitive use, we recently revamped the AVAM API portal that allows for one central location with intuitive access to all our development resources, such as API calls description and ability to try them out, access to stack overflow, GitHub, support requests forum, etc. You can read all about getting started with our API here.

 

New AVAM developer portal home pageNew AVAM developer portal home page

 

 

 

Speaking of development resources, AVAM has its own GitHub repository where you can find code samples and solutions on top of AVAM, to help you integrate with it or just inspire you about the different solutions that can be built with the service. Two of the latest additions to our GitHub are code samples:

 

Firstly, we added an example code for using the new widgets customization capabilities of AVAM. The newly added widget customization capability in AVAM enables developers to customize the widgets of AVAM into their own applications in different advanced ways including loading the JSON from external location, customized widgets styling to fit in to your application, and adding your own custom insights calculated elsewhere.

 

Secondly, the commercial engineering team (CSE) created two end-to-end solutions demonstrating how to leverage AVAM for different scenarios:

One solution demonstrates using AVAM’s stable frames and Azure Machine Learning to build custom search video solutions that complement AVAM’s out-of-the-box insights with additional information, tailored to the specific organization's custom data and search terms using Azure Cognitive Search.

 

Enriching AVAM results with “dog types” modelEnriching AVAM results with “dog types” model

An additional solution to demonstrate how to perform de-duplication on media files. The solution includes a workflow using Logic apps and Durable functions, that take as input either a video (among other content types) and sends it to AVAM. It performs de-duplication of the files by comparing hashes of file and re-using the output of previously analyzed files for any duplicates. The output of the analysis is put on to a service bus for downstream services to consume for cost saving purposes.

 

And one more fun addition to close with

Lastly, to celebrate the new name, we are expanding the set of visual themes available in the AVAM experience with a new “Azure blue” theme! Now you can choose to use the “Classic” Video Indexer green, pivot to "Dark" theme or choose to go with “Azure blue” which feels right at home as part of the Azure family of services.

 

New Theme selector settingNew Theme selector setting

 

 

Looking to get your feedback! 

In closing, we’d like to call you to provide feedback for all recent enhancements, especially those which were released as public preview. We collect your feedback and adjust the design where needed before releasing those as general available capabilities. For those of you who are new to our technology, we’d encourage you to get started today with these helpful resources: 

Co-Authors
Version history
Last update:
‎May 25 2021 05:23 AM
Updated by: