AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications

nbrady
Microsoft
Oct 09, 2024
Hey Srini, thanks for the questions. Since we are the platform team serving these models,

(1) I'm sure the Copilot and Copilot Studio teams are cooking up something for no-code and low-code developers to take advantage of this new modality.

(2) For a non-developer, we have some helpful documentation to get started: https://aka.ms/oai/docs. Without code you can create a resource by using the Azure Portal and launching the OpenAI Studio once a resource has been created. For now, we'd recommend creating the resource in either East US 2 or Sweden Central Azure regions. Check out this quickstart guide for how to interact with GPT-4o using the Realtime API within the Studio. For more code-first development, you can check out these pre-made samples: https://github.com/azure-samples/aoai-realtime-audio-sdk

(3) Much like the text-based capabilities of LLMs, these models can natively interpret many languages other than English. As always, it is best to test GPT-4o's audio capabilities to ensure the model has the fidelity you need to meet your business requirements.

(4) This voice capability unlocks a new modality in interacting with applications where AI becomes the universal interface. I've found inspiration in the scenarios our customers we gave early access like Bosch and Lyrebird Health as well as the examples OpenAI demonstrated in their spring update.
Travis_Wilson_MSFT
Microsoft
Oct 09, 2024
Whether you plan to write code or not, the process to get started with the new gpt-4o-realtime-preview model is fairly straightforward: (0) if you don't have one yet, create an Azure account via Azure Portal; (1) create an Azure OpenAI Service resource in one of the two preview regions (eastus2 or swedencentral); (2) using Azure AI Studio, create a gpt-4o-realtime-preview model deployment in your eastus2 or swedencentral; (3) use the "Real-time audio" playground (left navigation bar) to check out the new model with a live, browser-based voice-in/voice-out experience. From there, there are code samples -- including ones that just require setting environment variables and running -- at https://github.com/azure-samples/aoai-realtime-audio-sdk . We don't have much in the way of "build a new experience with no code whatsoever" yet given how new this all is, but we're continually looking for ways to make it easier to integrate this new /realtime feature set and other Azure OpenAI capabilities.
EricStarker
Former Employee
Oct 09, 2024
I see you've posted this question twice - it looks like a complete duplicate. Please let us know which of these threads you want us to answer these questions in - I'll delete the other one.
- SriniTech
  Brass Contributor
  Oct 09, 2024
  Edited into two parts now, rather than in one batch.

Event details