Starting 2021 With the Latest and Greatest Features of Azure Video Indexer
Published Mar 02 2021 11:04 AM 3,614 Views

In Q1, 2021 (CY) we released a significant set of for Video Indexer related to our ongoing global expansion, customer growth and additional scenarios based on our customer demand and feedback. . The mapping of the feature list into use cases with more details and examples per each feature, is described in the 2nd part of the blog.


Feature List


  • New regions availability –
    • Create paid accounts on US government cloud in Virginia and Arizona regions 
    • Create a Video Indexer paid account in the Switzerland West and Switzerland North, Central India regions
  • New and Improved Analytics capabilities - 

    •  Audio event detection (e.g. explosions, gunshots, crowd reactions as public preview explosions, gunshots, crowd reactions) as public preview. Full list of acoustic events is described in the details section and in Feb release notes. By the next two months it will also be available on paid accounts. 

    • Observed People detection* - detect standing people spotted in the video and trace their path with bounding boxes as public preview. 

    • Video indexer supports detection, grouping, and recognition of characters in animated content. Improved version of on trial and paid accounts is available as a public preview. Read more here.

  • New low-cost basic audio SKU enabling of the audio related analytics of Video Indexer. The use case and the exact list of analytics are described in “Increased accessibility to video and audio content” section. This can be achieved by having a new upload preset to enable subset of models, both in Video Indexer portal website and in the upload API.

  • New source languages* for transcription and translation: Turkish, Swedish, Finnish, Danish, Norwegian, multiple Arabic Dialects, Thai and French-Canada.

  • Extend the Widget customization* capabilities to the solution developers. Read mote in “Embed widgets into your own solution” section below. The new extension will be shipped by end of March, 2021. 

  • Account management and supportability:
    • New API portal* with support channels
    • Ability to have multiple account owners for a single VI account.
  • Improved User Experience in Video Indexer Website
    • Enable dark theme for Video Indexer Website
    • Enhance video player experience and the video player widget to support X2 playback speed for audio files
  • Ongoing accessibility improvements

*Features will be delivered by end of March, 2021. 


Details and Examples


Enabling new work safety and public safety use cases

For public safety use cases (but not only) we enabled Video Indexer running for the first time on Azure Gov cloud. Azure Government is a cloud platform built upon the foundational principles of security, privacy and control, compliance, and transparency. Public Sector entities receive a physically isolated instance of Azure. You can create paid accounts on US government cloud in Virginia and Arizona regions. Read more here.

Site operators and video investigators spend hours and event days to manually explore videos, to analyze accident in the road or at work, bank robbery or searching for court evidence post events. With the new preview features such as observed people tracing and audio event detection you can quickly analyze your video with no human intervention. Use cases such as efficient accident analysis can be achieved by automatically detect observed people and acoustic events within your video to truly understand what had happened post event. For example, site operator who would like to analyze an explosion that happened in a factory can now take the footage from the CCTV cameras, automatically get timeline when the explosion accrued, and then run backwards to track the employee’s activities around the explosion and before the event accrued. That could be extremely helpful for learning, prevent and improve processes within the industry or factory.

The observed people tracing model extracts the observed people, with bounding boxes who are displayed on the video while it plays. The user can mark a specific thumbnail (right side) and detect the corresponding bounding boxes in the video. We plan to enhance it in the future also with the person path. The first public preview is planned to be shipped by end of March, 2021.   



Screenshot from Video Indexer website: observed people insights with bounding boxes marked as a layer in the player. The path marked for a selected person is a future enhancement.


Audio event detection detects and classify the following audio effects in the non-speech segments of the content. The full list of caustic events includes Gunshot, Glass shatter, Alarm, Siren, Explosion, Dog Bark, Screaming, Laughter, Crowd reactions (cheering, clapping and booing) and silence (This model is relevant also for accessibility type of use cases and described in the accessibility paragraph as well).





Screenshot from Video Indexer website: introduces acoustic events as a new insight. Represented by event name and where is accrues in the timeline.


Deep search experience across the video library and within videos

One of the leading use cases for VI is enable discoverability across media archives and skip to specific locations within the video quickly based on multiple business insights. With this release, media companies would mainly benefit from improved algorithm version of animated characters recognition models in the video.

Those which use VI website experience will now have it available in dark mode which is usually aligned with other media and video tools within the media market. To enable the dark theme, open your user settings menu and toggle on this feature.


Screenshot from Video Indexer website: enable Dark mode from the user settings menu


Increased accessibility to video and audio content

Another key use case for Video Indexer is enabling accessibility to people with disabilities and across languages through transcription and translation, as well as compliance with accessibility regulations. You can now trigger basic audio insight and features by selecting “basic audio only” preset when uploading a video both in the upload screen and in the upload API. Choosing this option, trigger a selected following subset of audio only capabilities that would also reduce your pricing costs. This mode performs speech-to-text transcription and generation of a VTT subtitle/caption file as well as translation. The output of this mode includes an Insights JSON file including only the keywords, transcription, and timing information. Automatic language detection, content moderation and speaker diarization are not included in this . Note that the basic audio analysis is a low-cost offering comparing to the standard audio analysis. More information is available in our pricing page.


For both deep search and accessibility use cases, within the next couple of days we plan to enable additional languages based on our customer demand. When uploading a video/audio, the list of source languages in both API and the website includes also Turkish, Swedish, Finnish, Danish, Norwegian, multiple Arabic dialects, Thai, French-Canada. Those languages are relevant for transcription and translation.


Embed widgets into your own solution

If you are a solution developer, you can embed three types of widgets into your apps: Cognitive Insights, Player, and Editor. More information is available here. We enhanced the insights widget customization capabilities so that now you can also -

  • Enable custom styling to meet your application look and feel
  • Set a custom configuration to bring your custom AI (that isn’t generated by Video Indexer) side-by-side with Video Indexer AI and with that to enrich the insight widget with more insights.
  • Enrich your insights by integrating data from other data sources (e.g., you can build a custom model (e.g. for person detection and tracing) and get additional data from your own sources e.g. get data from Active Directory on the persons org structure and present that in the insight widget)  
  • Load a json from external location

We invite you to explore our code sample in Video Indexer Github Repository.


Example of the insight widget with customized styling. The web component includes Video Indexer insights side-by-side with “My custom topics” which The developers can now supplement the output from VI's widgets with additional output from other sources.


Looking to get your Feedback Today! 

In closing, we’d like to call you to provide feedback for all recent enhancements, especially those which were released as public preview. We collect your feedback and adjust the design where needed before releasing those as general available capabilities. For those of you who are new to our technology, we’d encourage you to get started today with these helpful resources: 



Thanks for reading ;)


Version history
Last update:
‎Mar 21 2021 08:00 AM
Updated by: