Group speech recognition

%3CLINGO-SUB%20id%3D%22lingo-sub-2120701%22%20slang%3D%22en-US%22%3EGroup%20speech%20recognition%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2120701%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20wondering%20if%20there%20has%20been%20much%20headway%20in%20speech%20to%20text%20in%20a%20setting%20like%20a%20room%20with%20multiple%20speakers%3F%26nbsp%3B%20The%20only%20way%20I%20know%20of%20is%20to%20have%20individual%20mics%20for%20each%20person%20to%20clearly%20separate%20the%20voices%20or%20directional%20mic%20integration%20with%20some%20processing%20prior%20to%20speech%20recognition.%26nbsp%3B%20Users%20like%20the%20speech%20to%20text%20transcripts%20with%20attribution%20like%20in%20Teams%20but%20of%20course%20it%20is%20terrible%20if%20one%20side%20of%20a%20conference%20has%20multiple%20people%20contributing.%26nbsp%3B%20Ends%20up%20being%20a%20jumble%20of%20words%20usually%20(but%20at%20least%20entertaining%20to%20read!).%26nbsp%3B%20So%20I'm%20wondering%20if%20you%20know%20of%20any%20solutions%20or%20work%20in%20that%20area%20that%20might%20be%20on%20the%20roadmap%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20it%20correct%20also%20that%20the%20Speech%20containers%20will%20run%20on%20Ubuntu%20Linux%20on%20ARM%20also%3F%26nbsp%3B%20It%20seems%20that%20way%20from%20the%20descriptions.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThanks%2C%3C%2FP%3E%3CP%3EBryan%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2120701%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAMA%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3ETTS%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2120724%22%20slang%3D%22en-US%22%3ERE%3A%20Group%20speech%20recognition%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2120724%22%20slang%3D%22en-US%22%3EWe%20have%20experimented%20with%20it%2C%20and%20even%20released%20this%20preview%20of%20a%20Conversational%20Transcription%20Service.%20You%20can%20take%20a%20look%20at%20this%20and%20give%20us%20feedback.%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fcognitive-services%2Fspeech-service%2Fconversation-transcription%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fcognitive-services%2Fspeech-service%2Fconversation-transcription%3C%2FA%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2120721%22%20slang%3D%22en-US%22%3ERe%3A%20Group%20speech%20recognition%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2120721%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F796257%22%20target%3D%22_blank%22%3E%40BryanSchacht%3C%2FA%3E%26nbsp%3Bfor%20the%20room%20transcription%20take%20a%20look%20here%3A%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fcognitive-services%2Fspeech-service%2Fconversation-transcription%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3EConversation%20Transcription%20(Preview)%20-%20Speech%20service%20-%20Azure%20Cognitive%20Services%20%7C%20Microsoft%20Docs%3C%2FA%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E
New Contributor

I'm wondering if there has been much headway in speech to text in a setting like a room with multiple speakers?  The only way I know of is to have individual mics for each person to clearly separate the voices or directional mic integration with some processing prior to speech recognition.  Users like the speech to text transcripts with attribution like in Teams but of course it is terrible if one side of a conference has multiple people contributing.  Ends up being a jumble of words usually (but at least entertaining to read!).  So I'm wondering if you know of any solutions or work in that area that might be on the roadmap?

 

Is it correct also that the Speech containers will run on Ubuntu Linux on ARM also?  It seems that way from the descriptions.

 

Thanks,

Bryan

2 Replies
We have experimented with it, and even released this preview of a Conversational Transcription Service. You can take a look at this and give us feedback. https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/conversation-transcription