Add a voice ID to the .vtt automatic captions and transcript file

 Sep 29 2020
2 Comments (2 New)

With the current work from home situation due to Covid-19, most people in my meetings are joining via their own mic. And even if they were sharing a mic, it should be possible to guess based on the characteristics and pitch of the voices which sentences belongs to different people.
I propose adding a voice ID connected to the different voices that are transcribed and put into chuncks in the .vtt file.

In my usecase, I was trying to use the VTT file for summarizing of a feedback session, and who said what, but this was not efficient. I think a VoiceID can potentially enable an efficient way export a more readable version (Person A said this, Person B said this... Instead of the VTT format.) which I could edit to send out a summary of the discussion, but I need to know who said which chunck. The .VTT export currently does not support this I believe.

I also think a Voice ID can be beneficial for deaf watchers, in cases where the participants did not turn on the camera, and you can't actually see who's lips are moving. You could add Person A (and the transcript) to the captions more easily.



I should add, that I was using the record function in Teams. So, maybe the data for voice ID needs to transfer form the recording in Teams... and and later be used when auto generating captions in Stream. But you can maybe work out that connection in between the Stream Team and the Teams Team at Microsoft.

