Forum Discussion
Communicate49
Nov 23, 2024Copper Contributor
Issue with Speech-to-Text Integration in Azure Communication Services Using C#
Context: We are building a bot using Azure Communication Services (ACS) and Azure Speech Services to handle phone calls. The bot asks questions (via TTS) and captures user responses using speech-to-text (STT).
What We’ve Done:
- Created an ACS instance and acquired an active phone number.
- Set up an event subscription to handle callbacks for incoming calls.
- Integrated Azure Speech Services for STT in C#.
Achievements:
- Successfully connected calls using ACS.
- Played TTS prompts generated from an Excel file.
Challenges:
- User responses are not being captured. Despite setting InitialSilenceTimeout to 10 seconds, the bot skips to the next question after 1–2 seconds without recognizing speech.
- The bot does not reprompt the user even when no response is detected.
Help Needed:
- How can we ensure accurate real-time speech-to-text capture during ACS telephony calls?
- Are there better configurations or alternate approaches for speech recognition in ACS?
Additional Context:
- Following the official ACS C# sample.
- Using Azure Speech Services and ACS SDKs.
Code Snippet (C#):
// Recognize user speech
async Task<string> RecognizeSpeechAsync(CallMedia callConnectionMedia, string callerId, ILogger logger)
{
// Configure recognition options
var recognizeOptions = new CallMediaRecognizeSpeechOptions(
targetParticipant: CommunicationIdentifier.FromRawId(callerId))
{
InitialSilenceTimeout = TimeSpan.FromSeconds(10), // Wait up to 10 seconds for the user to start speaking
EndSilenceTimeout = TimeSpan.FromSeconds(5), // Wait up to 5 seconds of silence before considering the response complete
OperationContext = "SpeechRecognition"
};
try
{
// Start speech recognition
var result = await callConnectionMedia.StartRecognizingAsync(recognizeOptions);
// Handle recognition success
if (result is Response<StartRecognizingCallMediaResult>)
{
logger.LogInformation($"Result: {result}");
logger.LogInformation("Recognition started successfully.");
// Simulate capturing response (replace with actual recognition logic)
return "User response captured"; // Replace with actual response text from recognition
}
logger.LogWarning("Recognition failed or timed out.");
return string.Empty; // Return empty if recognition fails
}
catch (Exception ex)
{
logger.LogError($"Error during speech recognition: {ex.Message}");
return string.Empty;
}
}
No RepliesBe the first to reply