Integrating Azure OpenAI and Azure Speech Services to Create a Voice-Enabled Chatbot with Python

Copper Contributor

Artificial intelligence (AI) is changing the way businesses operate, and many organizations are looking for ways to leverage AI to improve their operations and gain a competitive advantage. In this blog post, we’ll explore how to integrate Azure OpenAI service and Azure Speech service to create a chatbot that users can interact with via voice.

 

What is the difference between Azure OpenAI and OpenAI

Before we dive into the integration process, let’s first understand what Azure OpenAI Service is.

Azure OpenAI Service provides customers with access to advanced language AI capabilities through OpenAI’s GPT-4, GPT-3, Codex, and DALL-E models, all with the added security and enterprise support of Azure. Co-developed with OpenAI, Azure OpenAI ensures compatibility and a seamless transition between the two platforms. By using Azure OpenAI, customers can leverage the same models as OpenAI while benefiting from the security features of Microsoft Azure, such as private networking and regional availability. Additionally, Azure OpenAI promotes responsible AI by offering content filtering capabilities.

https://learn.microsoft.com/azure/cognitive-services/openai/overview?WT.mc_id=DT-MVP-5001664#compari...

 

Access to Azure OpenAI Service is exclusive to approved enterprise customers and partners, including Microsoft MVP. To gain access, registration is required. I feel privileged to have access to Azure OpenAI Service. If you want to access Azure OpenAI Service, you will need to complete the Request Access to Azure OpenAI Service form first.

 

Models in Azure OpenAI service

The Azure OpenAI service offers users access to a range of different models, each with its own capabilities and price point. The latest models available are the GPT-4 models, which are currently in preview. Existing Azure OpenAI customers can apply for access to these models by completing the form below.

https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_P...

 

The GPT-3 base models, including Davinci, Curie, Babbage, and Ada, are available and vary in capability and speed.

The Codex series of models, which are trained on natural language and code, can power natural language to code use cases.

If your application to access the Azure OpenAI service is approved, then you can create an Azure OpenAI service in your Azure subscription.

Jiadong_Chen_0-1681534624917.png

 

Azure OpenAI Studio

To get started, go to https://oai.azure.com/ to access Azure OpenAI Studio. Sign in using credentials that have access to your Azure OpenAI resource. You can select the appropriate directory, Azure subscription, and Azure OpenAI resource during or after the sign-in process.

 

Jiadong_Chen_1-1681534663466.png

 

Develop a Python program that incorporates Azure OpenAI GPT-4 and Azure Speech functionalities

Setting up Azure OpenAI and Azure Speech Services in the Azure portal is quite straightforward. Once created, we can access these services in our code. Let me illustrate this with an example in Python.

 

Installing the necessary Python libraries

If you want to integrate the Azure Speech-to-Text and Text-to-Speech functions as well as Azure OpenAI’s language generation capabilities into your Python project, you will need to install the necessary Python libraries. The first library you will need is azure-cognitiveservices-speech, which provides access to Azure’s Speech-to-Text and Text-to-Speech services. You can install this library using pip, the Python package manager.

 

The second library you will need is openai, which provides access to Azure OpenAI’s language generation API. Again, you can install this library using pip. Once you have these libraries installed, you can use them to create a powerful Python program that can recognize speech, generate language, and convert text to speech.

 

Setting Up Azure OpenAI and Speech Services in Python

Let’s craft a Python program and configure the Azure OpenAI API credentials, along with the credentials/configurations for Azure’s Speech-to-Text and Text-to-Speech services.

 

 

import os
import azure.cognitiveservices.speech as speechsdk
import openai

# Set up Azure OpenAI API credentials
openai.api_type = "azure"
openai.api_base = os.getenv("OPENAI_ENDPOINT")
openai.api_version = "2023-03-15-preview"
openai.api_key = os.getenv("OPENAI_API_KEY")

# Set up Azure Speech-to-Text and Text-to-Speech credentials
speech_key = os.getenv("SPEECH_API_KEY")
service_region = "eastus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
speech_config.speech_synthesis_language = "en-NZ"

# Set up the voice configuration
speech_config.speech_synthesis_voice_name = "en-NZ-MollyNeural"
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

 

 

First, let’s take a look at the steps involved in setting up the Azure OpenAI service. For the usage of Azure OpenAI service, it is necessary to furnish the endpoint of the previously generated instance, an Azure OpenAI endpoint looks like the following format:

https://{your-resource-name}.openai.azure.com/
 

Moreover, it’s crucial to specify the API version when utilizing the ChatGPT (preview) and GPT-4 (preview) models to generate chat message completions. Please note that chat completions are exclusively accessible with the api-version=2023–03–15-preview.

 

To access the Azure OpenAI service from your Python code, the next step involves providing the API key. This key can be located in the Keys and Endpoint panel of your Azure OpenAI service, as shown below.

Jiadong_Chen_2-1681534785484.png

 

Likewise, it is necessary to establish the Azure Speech service. In this case, I have opted for the en-NZ-MollyNeural voice, which emulates the accent of New Zealanders, also known as Kiwis.

The link below provides access to information on the languages and voice support available for the Speech service.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts&...

 

Creating a speech recognizer and starting the recognition

To converse with a chatbot powered by GPT-4 in a human-like conversation, the first step is to create a speech recognizer capable of identifying our voice.

 

# Define the speech-to-text function
def speech_to_text():
    # Set up the audio configuration
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

    # Create a speech recognizer and start the recognition
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    print("Say something...")

    result = speech_recognizer.recognize_once_async().get()

    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        return result.text
    elif result.reason == speechsdk.ResultReason.NoMatch:
        return "Sorry, I didn't catch that."
    elif result.reason == speechsdk.ResultReason.Canceled:
        return "Recognition canceled."

 

 In the code above, we define a function named speech_to_textthat uses the Microsoft Azure Speech Service SDK to perform speech-to-text conversion. It sets up the audio configuration and creates a speech recognizer object, which is configured using the Speech Service’s language and authentication credentials. The function prompts the user to speak and starts the recognition process asynchronously using the recognize_once_async() method. Once the recognition is completed, the function checks the “reason” attribute of the “result” object to determine if the speech was recognized successfully or not. If it was recognized, the function returns the recognized text, otherwise it returns an error message.

 

Using Azure OpenAI’s GPT-4 engine to generate text in response to a prompt

After getting input from the user using speech-to-text in the previous step, we can use the input as the prompt in Azure OpenAI’s GPT-4 engine.

 

# Define the Azure OpenAI language generation function
def generate_text(prompt):
    response = openai.ChatCompletion.create(
        engine="chenjd-test",
        messages=[
            {"role": "system", "content": "You are an AI assistant that helps people find information."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=800,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )
    return response['choices'][0]['message']['content']

 

This Python code above defines a function called generate_text that uses Azure OpenAI's GPT-4 engine to generate text in response to a prompt.

 

The function takes a prompt as input and uses the openai.ChatCompletion.create() method to generate a response, with parameters like engine, messages, temperature, max_tokens, top_p, frequency_penalty, presence_penalty, and stop.

 

The main input of the openai.ChatCompletion.create() method is the messages parameter, which should be an array consisting of message objects. Each message object in the array must include a “role” (which can be either “system”, “user”, or “assistant”) and a “content” field (which contains the message’s content, in this case, the message’s content is the value of prompt).

The following link provides more details.

https://learn.microsoft.com/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-langua...

 

Adding Text-to-Speech Functionality to the Chatbot

Let’s now include another feature that allows the chatbot to vocalize the text produced by the Azure OpenAI service in a human-like manner.

 

# Define the text-to-speech function
def text_to_speech(text):
    try:
        result = speech_synthesizer.speak_text_async(text).get()
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Text-to-speech conversion successful.")
            return True
        else:
            print(f"Error synthesizing audio: {result}")
            return False
    except Exception as ex:
        print(f"Error synthesizing audio: {ex}")
        return False

 

This Python code defines a function called text_to_speech that takes a text parameter, which is generated by Azure OpenAI, as input. It uses the speech_synthesizer object to asynchronously synthesize the input text and generate speech audio using the en-NZ-MollyNeuralvoice.

 

Follow me: Jiadong Chen

 

 

 

3 Replies

@Jiadong_Chen 

Hi Chen, I forked your solution and I tried to run it replacing endpoint and key with my environment values but it seems it doesn't work, it throws an exception "openai.error.InvalidRequestError: The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again."

To be sure I replaced the right keys and endpoint I tried to call the OpenAI endpoint from postman and it works fine.

Can you help me, please?

@ZaffaSte 

Sorry @Jiadong_Chen , I finally found the problem: to make it works you have to change "engine" parameter with the model deployment name (line 56 in app.py file).

Probably adding this parameters in configuration variables in the first lines of code could be a good update.


Hth,
Stefano.