Speaker identification in transcripts?

Question

Once a stream is completed and ready for playback, how may we:
1) Download the transcript easily (without a select-all-copy hack or multiple clicks in Settings) and
2) Potentially determine who spoke when?
&nbsp;
If (2) is not available based on the audio, I could provide a quick prototype in github if it may be useful to others; at least for playback of meetings with &lt;=3 people actively speaking. Here is one way:
https://stackoverflow.com/questions/20414667/cocktail-party-algorithm-svd-implementation-in-one-line-of-code
&nbsp;
Here is what the experience could look like:
1. On the Stream's site of the conversation, there is an obvious single-click button in top-right that says "Download transcript" [no extra clicks or hacks required]
2. The transcript is downloaded as a simple text file [not "vtt"; e.g. "txt" so that Microsoft quickly opens it with something like Notepad]
3. The downloaded text file has lines like this:
[time] &lt;person&gt; spoken text
&nbsp;
So for example:
[12:20] Bob: good morning, everyone welcome to the meeting.[12:21] Alice: today we will speak about a new tool on Teams
&nbsp;
The above names could easily be determined by who is in the meeting, which is metadata that Teams app already has during a meeting.
Let's make it happen! I'm happy to help.&nbsp;
We don't need a breakthru algo or much machine-learning to do this with sufficient accuracy so that it's useful. It's mostly putting metadata together.&nbsp;
&nbsp;
Here are some potential use-cases:
1. Determine how long each person spoke -- this could help derive which topics in the meeting may have been most important
2. Determine who asked the most questions and who answered the most questions -- this could help with follow-up discussions, e.g. if someone answered most questions about a given topic, they could be emailed for follow-up questions.

jamie10555 · Answer

Quetzalcoatl&nbsp; this is the kind of thing I am after. At present the standard dowload file has far too much information in it and needs to be trimmed. When using web utility tool (https://web.microsoftstream.com/VTTCleaner/CleanVTT.html) you end up with data all clumped together without time stamps and separation of speakers. The technology is there but it's the execution of formatting that needs improving.&nbsp;&nbsp;I've resorted to using the web tool and then pasting into word and then running some find+replace for '?' which I can then use to insert spaces. I have a saved copy of the other 'raw' transcript vtt file to reference for timings of speech as needed...

abhi007 · Answer

Hi, did you get any success with this?

Forum Discussion

Speaker identification in transcripts?

2 Replies

Resources