Event banner
AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications
Event details
Question to Travis or Allan: 1. How can no-code developers best utilize the Copilot Voice product and GPT-4 in their applications? Are there specific integrations with no-code platforms (like Make, Bubble, etc.) that you recommend? 2. For a non-programmer, what is the setup process like to get started with GPT-4 in Azure? Are there any beginner-friendly resources or templates available? 3. Could you explain how the new multilingual features work in the real-time API? What steps are involved for a no-code developer to incorporate multiple languages seamlessly? 4. What are some specific examples of how no-code developers can use GPT-4’s real-time API for improving customer interaction experiences, such as through chatbots or voice assistants?
- nbradyOct 09, 2024
Microsoft
Hey Srini, thanks for the questions. Since we are the platform team serving these models,
(1) I'm sure the Copilot and Copilot Studio teams are cooking up something for no-code and low-code developers to take advantage of this new modality.
(2) For a non-developer, we have some helpful documentation to get started: https://aka.ms/oai/docs. Without code you can create a resource by using the Azure Portal and launching the OpenAI Studio once a resource has been created. For now, we'd recommend creating the resource in either East US 2 or Sweden Central Azure regions. Check out this quickstart guide for how to interact with GPT-4o using the Realtime API within the Studio. For more code-first development, you can check out these pre-made samples: https://github.com/azure-samples/aoai-realtime-audio-sdk
(3) Much like the text-based capabilities of LLMs, these models can natively interpret many languages other than English. As always, it is best to test GPT-4o's audio capabilities to ensure the model has the fidelity you need to meet your business requirements.
(4) This voice capability unlocks a new modality in interacting with applications where AI becomes the universal interface. I've found inspiration in the scenarios our customers we gave early access like Bosch and Lyrebird Health as well as the examples OpenAI demonstrated in their spring update.
- Travis_Wilson_MSFTOct 09, 2024
Microsoft
Whether you plan to write code or not, the process to get started with the new gpt-4o-realtime-preview model is fairly straightforward: (0) if you don't have one yet, create an Azure account via Azure Portal; (1) create an Azure OpenAI Service resource in one of the two preview regions (eastus2 or swedencentral); (2) using Azure AI Studio, create a gpt-4o-realtime-preview model deployment in your eastus2 or swedencentral; (3) use the "Real-time audio" playground (left navigation bar) to check out the new model with a live, browser-based voice-in/voice-out experience. From there, there are code samples -- including ones that just require setting environment variables and running -- at https://github.com/azure-samples/aoai-realtime-audio-sdk . We don't have much in the way of "build a new experience with no code whatsoever" yet given how new this all is, but we're continually looking for ways to make it easier to integrate this new /realtime feature set and other Azure OpenAI capabilities. - EricStarkerOct 09, 2024Former EmployeeI see you've posted this question twice - it looks like a complete duplicate. Please let us know which of these threads you want us to answer these questions in - I'll delete the other one.
- SriniTechOct 09, 2024Brass ContributorEdited into two parts now, rather than in one batch.