Issue with Speech-to-Text Integration in Azure Communication Services Using C#

Context: We are building a bot using Azure Communication Services (ACS) and Azure Speech Services to handle phone calls. The bot asks questions (via TTS) and captures user responses using speech-to-text (STT).

What We’ve Done:

Created an ACS instance and acquired an active phone number.
Set up an event subscription to handle callbacks for incoming calls.
Integrated Azure Speech Services for STT in C#.

Achievements:

Successfully connected calls using ACS.
Played TTS prompts generated from an Excel file.

Challenges:

User responses are not being captured. Despite setting InitialSilenceTimeout to 10 seconds, the bot skips to the next question after 1–2 seconds without recognizing speech.
The bot does not reprompt the user even when no response is detected.

Help Needed:

How can we ensure accurate real-time speech-to-text capture during ACS telephony calls?
Are there better configurations or alternate approaches for speech recognition in ACS?

Additional Context:

Following the https://github.com/Azure-Samples/communication-services-dotnet-quickstarts/tree/main/callautomation-openai-sample-csharp.
Using Azure Speech Services and ACS SDKs.

Code Snippet (C#):

// Recognize user speech
async Task<string> RecognizeSpeechAsync(CallMedia callConnectionMedia, string callerId, ILogger logger)
{
    // Configure recognition options
    var recognizeOptions = new CallMediaRecognizeSpeechOptions(
        targetParticipant: CommunicationIdentifier.FromRawId(callerId))
    {
        InitialSilenceTimeout = TimeSpan.FromSeconds(10), // Wait up to 10 seconds for the user to start speaking
        EndSilenceTimeout = TimeSpan.FromSeconds(5),     // Wait up to 5 seconds of silence before considering the response complete
        OperationContext = "SpeechRecognition"
    };

    try
    {
        // Start speech recognition
        var result = await callConnectionMedia.StartRecognizingAsync(recognizeOptions);

        // Handle recognition success
        if (result is Response<StartRecognizingCallMediaResult>)
        {
            logger.LogInformation($"Result: {result}");
            logger.LogInformation("Recognition started successfully.");
            // Simulate capturing response (replace with actual recognition logic)
            return "User response captured"; // Replace with actual response text from recognition
        }

        logger.LogWarning("Recognition failed or timed out.");
        return string.Empty; // Return empty if recognition fails
    }
    catch (Exception ex)
    {
        logger.LogError($"Error during speech recognition: {ex.Message}");
        return string.Empty;
    }
}

azure

Azure Communication Services

Azure Speech Service

Forum Discussion

Issue with Speech-to-Text Integration in Azure Communication Services Using C#

Resources