Forum Discussion
Speech Recognition for Alphanumeric
Yeah, this is a common issue with Azure Speech-to-Text, especially when users speak individual letters over the phone. Letters like “P”, “B”, and “D” often get confused because they sound similar, and it’s worse on mobile networks due to audio compression and background noise.
One way to improve accuracy is by using a Custom Speech model in Azure Speech Studio. You can train it with your specific product ID patterns or common phrases your users say, so it learns what to expect.
Also, asking users to say letters using the phonetic alphabet, like “P as in Papa,” really helps. On your end, you can map those phonetic words back to actual characters.
If your product IDs follow a fixed format, it’s also a good idea to apply some post-processing or regex to clean up common recognition errors, like replacing "D" with "P" if the result still matches a valid ID format.
Lastly, for important inputs, offer a DTMF fallback (pressing keys on the phone) to ensure the user can complete the task accurately. VoIP will typically give better accuracy than mobile, so the network quality definitely makes a difference.