Forum Discussion

Fred Decker's avatar
Fred Decker
Copper Contributor
Jul 03, 2017

How can I set an audio input device for Speech SDK v.11 recognition?

I am trying this code in VB.NET and I get a COM error. The SpMMAudio in type is a COM object stream and the SetInputToWaveStream expects a different type. Code like this is actually in the MSDN, yet it doesn't work. There also appears to be a bug in SAPI 5.4 on Windows server. Both code snippets are below. Speech in .NET really needs a .SetInputToAudioDeviceID instead of .SetInputToDefaultAudioDevice(). This is in VB.NET trying to set my audio input using SAPI SDK v11 "Microsoft.Speech":

Dim sre As New SpeechRecognitionEngine
Dim fmt As New SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono)
Dim audiosource As ISpeechMMSysAudio
audiosource = New SpMMAudioIn
audiosource.DeviceId = WindowsAudioDeviceID  'set audio input to audio device Id
' audiosource.Format.Type = SpeechAudioFormatType.SAFT11kHz16BitMono 
sre.SetInputToAudioStream(audiosource, fmt) <----- Invalid Cast with COM here

And here is what appears to be a serious bug in SAPI 5.4 (works in SAPI 5.1 and in SAPI v11 "Microsoft Speech Object Library" checked in COM references). Note it is the same code as above that works using the Microsoft.Speech namespace:

 

Dim my_AudioIn As ISpeechMMSysAudio
my_AudioIn = New SpMMAudioIn
my_AudioIn.DeviceId = 0

Trying to set the .DeviceID to anything other than -1 (WAVE_MAPPER)  throws a COMException: x80045002 which maps to SPERR_ALREADY_INITIALIZED. Setting the SpAudioOUT instead of IN works just fine. So I can set where I want my TTS speech to be output but I can't set where I want my speech reconition input to come from in my telephony application. This code worked in previous versions of SAPI. I need to use an "inproc" recognizer. 

2 Replies

  •  

    Take a look at articles under "https://msdn.microsoft.com/en-us/library/dd371428(v=vs.85).aspx" section, especially at

    "https://msdn.microsoft.com/en-us/library/dd370819(v=vs.85).aspx". Quote:

     

    InStarting from Windows Vista, the waveOutOpen and waveInOpen functions always 
    assign the audio streams that they create to the default session — the process-specific session
    that is identified by the session GUID value GUID_NULL.

    In your case you have to

    step 1: enumerate capture audio devices, choose one via waveInOpen function.

    step 2: use SetInputToDefaultAudioDevice() method for speech recognition.

     

    that's probably because while modern windows usually detects a few endpoint audio devices on modern hardware, its user who chooses which one he uses at the moment somewhere under "sound etc" control panel item (shown on attached image). guess so, i ain't sure. but you cannot rely on user in your case, aren't you?

     

    try to find an easier (than CRT API) way to set default audio input device. i hope CIM/WMI classes have a method for it.
    OK, i'm sure you found a solution monthes ago. But whatever...)

    • Fred Decker's avatar
      Fred Decker
      Copper Contributor

      Thanks, we abandoned the project because we could never get it to work. I am actually back on it now and System.Speech, Microsoft.Speech and SAPI (Microsoft Speech Object v11) all have a different issue that makes it impossible for us to get it to work the way we need. One can't change the audio input, another won't let us change the recognizer language, etc. If you have any new information, by all means share it :)

Resources