Forum Discussion
Transcription capability isn't very good
Adrian Hyde, I am sorry to hear that you are unsatisfied with the transcription quality :(
You are correct, the Skype Broadcast solution is utilizing the same core technology -- would you be able to share examples of the discrepancy in output?
We are aware of the issues that we have with background noise/music, and unfortunately there is nothing we can do in the short-term to fix this.
Hey Adarsh Solanki - For Skype Broadcast versus Stream, I can only provide anecdotal evidence that one works better than the other. We have not done the same media through both and compared.
The one area however we do see a significant difference is between the transcoding between Stream and a 3rd-party we have typically used in the past (3PlayMedia) for this function. We have run several videos through both and find 3PlayMedia much more accurate.
I'm open to suggestions on how we could improve this....should I take this up with the Azure Media folks? Or are there some settings within Stream we can tweak to see if performance can be improved?
- Adarsh SolankiJul 27, 2017Microsoft
Fortunately, I am the Azure Media Services contact for Speech-to-text :)
I would hesitate make any assumptions on quality without testing on identical content, as there are many subtle variables that can lead to low quality transcription. Stream should have transcript quality that is at-par with other Microsoft services utilizing speech-to-text. Note that Stream shows the unedited automatic transcript. We are currently building a feature that will allow a user to edit the automatic transcript to fix any errors.
Re: 3PlayMedia, this service utilizes human editing in addition to automatic transcript generation. A more fair comparison would be to take the output of our automatic transcript generation and send it to a human transcript editor to correct the transcript prior to publishing.
- DeletedFeb 28, 2018
Hi, are they any plans to improve on the auto-generated transcription accuracy? We have tried it a couple of times and it isn't really anywhere close to what is actually being said (without any background music or noise). This means we would have to spend a lot of time editing the transcription for it to be of any benefit. This means we just don't use it.
- Adarsh SolankiFeb 28, 2018Microsoft
Hi Trevor -
We are absolutely planning to improve the auto-generated captions as much as we can! Sorry to hear that you are having to go through the trouble of editing your captions to provide value. We hope to decrease our error rate considerably in the coming year as we upgrade the underlying speech infrastructure.