Speaker identification in transcripts?

%3CLINGO-SUB%20id%3D%22lingo-sub-2279254%22%20slang%3D%22en-US%22%3ESpeaker%20identification%20in%20transcripts%3F%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2279254%22%20slang%3D%22en-US%22%3E%3CP%3EOnce%20a%20stream%20is%20completed%20and%20ready%20for%20playback%2C%20how%20may%20we%3A%3C%2FP%3E%0A%3CP%3E1)%20Download%20the%20transcript%20easily%20(without%20a%20select-all-copy%20hack%20or%20multiple%20clicks%20in%20Settings)%20and%3C%2FP%3E%0A%3CP%3E2)%20Potentially%20determine%20who%20spoke%20when%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIf%20(2)%20is%20not%20available%20based%20on%20the%20audio%2C%20I%20could%20provide%20a%20quick%20prototype%20in%20github%20if%20it%20may%20be%20useful%20to%20others%3B%20at%20least%20for%20playback%20of%20meetings%20with%20%26lt%3B%3D3%20people%20actively%20speaking.%20Here%20is%20one%20way%3A%3C%2FP%3E%0A%3CP%3E%3CA%20href%3D%22https%3A%2F%2Fstackoverflow.com%2Fquestions%2F20414667%2Fcocktail-party-algorithm-svd-implementation-in-one-line-of-code%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noreferrer%22%3Ehttps%3A%2F%2Fstackoverflow.com%2Fquestions%2F20414667%2Fcocktail-party-algorithm-svd-implementation-in-one-line-of-code%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EHere%20is%20what%20the%20experience%20could%20look%20like%3A%3C%2FP%3E%0A%3CP%3E1.%20On%20the%20Stream's%20site%20of%20the%20conversation%2C%20there%20is%20an%20obvious%20single-click%20button%20in%20top-right%20that%20says%20%22Download%20transcript%22%20%5Bno%20extra%20clicks%20or%20hacks%20required%5D%3C%2FP%3E%0A%3CP%3E2.%20The%20transcript%20is%20downloaded%20as%20a%20simple%20text%20file%20%5Bnot%20%22vtt%22%3B%20e.g.%20%22txt%22%20so%20that%20Microsoft%20quickly%20opens%20it%20with%20something%20like%20Notepad%5D%3C%2FP%3E%0A%3CP%3E3.%20The%20downloaded%20text%20file%20has%20lines%20like%20this%3A%3C%2FP%3E%0A%3CP%3E%5Btime%5D%20%3CPERSON%3E%20spoken%20text%3C%2FPERSON%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ESo%20for%20example%3A%3C%2FP%3E%0A%3CP%3E%5B%3CSPAN%3E12%3A20%5D%20Bob%3A%20good%20morning%2C%20everyone%20welcome%20to%20the%20meeting.%3C%2FSPAN%3E%3CBR%20%2F%3E%3CSPAN%3E%5B12%3A21%5D%20Alice%3A%20today%20we%20will%20speak%20about%20a%20new%20tool%20on%20Teams%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSPAN%3EThe%20above%20names%20could%20easily%20be%20determined%20by%20who%20is%20in%20the%20meeting%2C%20which%20is%20metadata%20that%20Teams%20app%20already%20has%20during%20a%20meeting.%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3ELet's%20make%20it%20happen!%20I'm%20happy%20to%20help.%26nbsp%3B%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3EWe%20don't%20need%20a%20breakthru%20algo%20or%20much%20machine-learning%20to%20do%20this%20with%20sufficient%20accuracy%20so%20that%20it's%20useful.%20It's%20mostly%20putting%20metadata%20together.%26nbsp%3B%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSPAN%3EHere%20are%20some%20potential%20use-cases%3A%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E1.%20Determine%20how%20long%20each%20person%20spoke%20--%20this%20could%20help%20derive%20which%20topics%20in%20the%20meeting%20may%20have%20been%20most%20important%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%3E2.%20Determine%20who%20asked%20the%20most%20questions%20and%20who%20answered%20the%20most%20questions%20--%20this%20could%20help%20with%20follow-up%20discussions%2C%20e.g.%20if%20someone%20answered%20most%20questions%20about%20a%20given%20topic%2C%20they%20could%20be%20emailed%20for%20follow-up%20questions.%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E
Microsoft

Once a stream is completed and ready for playback, how may we:

1) Download the transcript easily (without a select-all-copy hack or multiple clicks in Settings) and

2) Potentially determine who spoke when?

 

If (2) is not available based on the audio, I could provide a quick prototype in github if it may be useful to others; at least for playback of meetings with <=3 people actively speaking. Here is one way:

https://stackoverflow.com/questions/20414667/cocktail-party-algorithm-svd-implementation-in-one-line...

 

Here is what the experience could look like:

1. On the Stream's site of the conversation, there is an obvious single-click button in top-right that says "Download transcript" [no extra clicks or hacks required]

2. The transcript is downloaded as a simple text file [not "vtt"; e.g. "txt" so that Microsoft quickly opens it with something like Notepad]

3. The downloaded text file has lines like this:

[time] <person> spoken text

 

So for example:

[12:20] Bob: good morning, everyone welcome to the meeting.
[12:21] Alice: today we will speak about a new tool on Teams

 

The above names could easily be determined by who is in the meeting, which is metadata that Teams app already has during a meeting.

Let's make it happen! I'm happy to help. 

We don't need a breakthru algo or much machine-learning to do this with sufficient accuracy so that it's useful. It's mostly putting metadata together. 

 

Here are some potential use-cases:

1. Determine how long each person spoke -- this could help derive which topics in the meeting may have been most important

2. Determine who asked the most questions and who answered the most questions -- this could help with follow-up discussions, e.g. if someone answered most questions about a given topic, they could be emailed for follow-up questions.

0 Replies