Text Recognition for Video in Microsoft Video Indexer

In Video Indexer, we have the capability for recognizing display text in videos. There is a misconception that AI for video is simply extracting frames from a video and running computer vision algorithms on each video frame but video processing is much more than processing individual frames using an image processing algorithm – for example, with 30 frames per second, a minute-long video is 1800 frames producing a lot of data but, as we see above, not many meaningful words. There is a separate blog that covers how AI for video is different from AI for images.

While humans have cognitive abilities that allow them to complete hidden parts of the text and disambiguate local deficiencies resulting from bad video quality, direct application of OCR is not sufficient for automatic text extraction from videos. In Video Indexer, we developed and implemented a dedicated approach to tackle this challenge.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Text Recognition for Video in Microsoft Video Indexer

Text Recognition for Video in Microsoft Video Indexer