Forum Discussion
How can I set an audio input device for Speech SDK v.11 recognition?
I am trying this code in VB.NET and I get a COM error. The SpMMAudio in type is a COM object stream and the SetInputToWaveStream expects a different type. Code like this is actually in the MSDN, yet it doesn't work. There also appears to be a bug in SAPI 5.4 on Windows server. Both code snippets are below. Speech in .NET really needs a .SetInputToAudioDeviceID instead of .SetInputToDefaultAudioDevice(). This is in VB.NET trying to set my audio input using SAPI SDK v11 "Microsoft.Speech":
Dim sre As New SpeechRecognitionEngine Dim fmt As New SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono) Dim audiosource As ISpeechMMSysAudio audiosource = New SpMMAudioIn audiosource.DeviceId = WindowsAudioDeviceID 'set audio input to audio device Id ' audiosource.Format.Type = SpeechAudioFormatType.SAFT11kHz16BitMono sre.SetInputToAudioStream(audiosource, fmt) <----- Invalid Cast with COM here
And here is what appears to be a serious bug in SAPI 5.4 (works in SAPI 5.1 and in SAPI v11 "Microsoft Speech Object Library" checked in COM references). Note it is the same code as above that works using the Microsoft.Speech namespace:
Dim my_AudioIn As ISpeechMMSysAudio my_AudioIn = New SpMMAudioIn my_AudioIn.DeviceId = 0
Trying to set the .DeviceID to anything other than -1 (WAVE_MAPPER) throws a COMException: x80045002 which maps to SPERR_ALREADY_INITIALIZED. Setting the SpAudioOUT instead of IN works just fine. So I can set where I want my TTS speech to be output but I can't set where I want my speech reconition input to come from in my telephony application. This code worked in previous versions of SAPI. I need to use an "inproc" recognizer.
2 Replies
- Vladimir NikitinCopper Contributor
Take a look at articles under "https://msdn.microsoft.com/en-us/library/dd371428(v=vs.85).aspx" section, especially at
"https://msdn.microsoft.com/en-us/library/dd370819(v=vs.85).aspx". Quote:
InStarting from Windows Vista, the waveOutOpen and waveInOpen functions always
assign the audio streams that they create to the default session — the process-specific session
that is identified by the session GUID value GUID_NULL.In your case you have to
step 1: enumerate capture audio devices, choose one via waveInOpen function.
step 2: use SetInputToDefaultAudioDevice() method for speech recognition.
that's probably because while modern windows usually detects a few endpoint audio devices on modern hardware, its user who chooses which one he uses at the moment somewhere under "sound etc" control panel item (shown on attached image). guess so, i ain't sure. but you cannot rely on user in your case, aren't you?
try to find an easier (than CRT API) way to set default audio input device. i hope CIM/WMI classes have a method for it.
OK, i'm sure you found a solution monthes ago. But whatever...)- Fred DeckerCopper Contributor
Thanks, we abandoned the project because we could never get it to work. I am actually back on it now and System.Speech, Microsoft.Speech and SAPI (Microsoft Speech Object v11) all have a different issue that makes it impossible for us to get it to work the way we need. One can't change the audio input, another won't let us change the recognizer language, etc. If you have any new information, by all means share it :)