AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications

64 Comments

EricStarker
Former Employee
Oct 15, 2024
Check out a summary of the questions and answers from this event here:

Summary
cdrguru
Occasional Reader
Oct 10, 2024
I've been exploring the new GPT-4o-Realtime API with Audio and wanted to share how I've integrated it into an Azure Function for a solution called AInsights. This setup allows me to tag specific parts of conversations—like "perfect prompts" or key "assistant responses"—and later retrieve them using voice commands. It's been incredibly helpful when I need to quickly reference past demos or insights. For example, I might say:
Remember the demo where [person/company] mentioned [specific keyword/phrase]? Can you recall that and provide the follow-up insight?
The problem is that I forget how to type and spell and find it so much quicker to ask my "AI" to do it. I've created a short demo showcasing how I use GPT-4o's voice capabilities to efficiently search my Azure AI Chat history and streamline my workflow. You can watch it here: https://www.youtube.com/watch?v=9D0i-J-KIa0
EricStarker
Former Employee
Oct 09, 2024
Thanks for joining us for the GPT-4o-realtime API with Audio AMA! We'll be posting a summary of the questions and answers soon. See you next time!
- Hal_Hostetler
  MVP
  Oct 09, 2024
  Thanks, excellent session!!
SriniTech
Brass Contributor
Oct 09, 2024
Thank you for the AMA session/ It was a brilliant session. Maybe we could do this again sometime.
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  It's been great talking with you! This is for sure just the beginning; we'll have more coming soon.
- LonChen
  Microsoft
  Oct 09, 2024
  Glad to hear this is helpful, Srini. We will do it more often in the future. Stay tuned for our future events.
- Allan_Carranza
  Microsoft
  Oct 09, 2024
  Thanks for joining us - We love opportunities to connect with our wonderful community of developers!
EricStarker
Former Employee
Oct 09, 2024
Just ten minutes to go! Get your questions in!
Allan_Carranza
Microsoft
Oct 09, 2024
In case you missed it, Azure AI Search built a RAG + Voice demo utilizing the GPT-4o Realtime API. Check out the blog post below that includes more details, including a code sample to get started with RAG + Voice!

VoiceRAG: An App Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio - Microsoft Community Hub
CaptainAmazing
Copper Contributor
Oct 09, 2024
Will this new feature be included in PowerApps and Power Automate or will this only be available in Azure OpenAI as an API for coding? Follow on from this is the an API syntax or reference site we can reference for some help on how to interface with the API? Are there any specific code or framework requirements to allow for this to be a part of the project we are working on? Finally what sort of costs are being planned for this (will it be subscription or per voice line or volume of data)?
Okay my bad I just saw the SDK discussion earlier. Any chance we can get a view of some of those examples mentioned?
- Allan_Carranza
  Microsoft
  Oct 09, 2024
  
  .NET sample is here
  
  Javascript sample is here
  
  Python sample is here
  
  As for pricing, we will be sharing more details very soon! Similar to other Azure OpenAI models, we will start with Pay-as-you-go pricing and introduce others like provisioned, batch, etc. over time. If there are other pricing models that would better help you scale your applications, we always welcome feedback!
jrwarwick
Brass Contributor
Oct 09, 2024
Will there be an OotB hardware option similar to Amazon Echo Dot or Google Home voice assistant nodes? (this is our chance to have a second shot at having an awesome Cortana implementation). r if not, will there be enough API and persistence to implement something like that?
- Allan_Carranza
  Microsoft
  Oct 09, 2024
  As the Azure AI Platform team, it is our responsibility to ensure that state-of-the-art technology is available for any developer to integrate into their exciting products and applications. With the improvements the GPT-4o-realtime API provides in speech and audio capabilities, there are endless opportunities to integrate speech capabilities into any product.....whether old or new. 😁
  - CaptainAmazing
    Copper Contributor
    Oct 09, 2024
    Even Text messages! LOL 😉 any chance this will be integrated into Dynamix or PABX solutions or are we just talking the API today?
riyazlambat
Occasional Reader
Oct 09, 2024
is this only a text based event
- EricStarker
  Former Employee
  Oct 09, 2024
  Yes, this is a text-based event.
CaptainAmazing
Copper Contributor
Oct 09, 2024
On a more serious note, are there any Visual Studio examples or tutorials that engage with these audio real time chat API's that we can use to get started with on our own projects?
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  For some basic "getting started" resources, check out https://github.com/azure-samples/aoai-realtime-audio-sdk -- this has an interactive localhost web demo using a standalone TypeScript SDK library, an interactive console demo with tools using the official .NET SDK's latest beta, and some non-interactive, file-based demonstrations using a standalone Python library.
- Allan_Carranza
  Microsoft
  Oct 09, 2024
  We have prepared SDKs and samples to help builders get up and running as quickly as possible -> https://github.com/azure-samples/aoai-realtime-audio-sdk. This repository is actively monitored by our team, and we welcome any suggestions and contributions per these guidelines -> https://github.com/Azure-Samples/aoai-realtime-audio-sdk/blob/main/CONTRIBUTING.md. Our goal is to continuously improve the experience and make getting started with any new model as easy as possible!

Event banner

AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications

Event details

64 Comments

Date and Time