Event banner
AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications
Event details
Unlock the potential of your applications with the latest GPT-4o-realtime API with Audio, now available on Azure on October 1st, 2024. Join us to explore how this model, integrated as part of the new Copilot Voice product will revolutionize your AI applications to new heights with natural and multilingual conversations, enhanced customer interactions with faster responses, and streamlined workflows and business operations.
Don’t miss out on the future of AI innovation—register now and be part of the transformation!
An AMA is a live text-based online event similar to an “Ask Me Anything” on Reddit. This AMA gives you the opportunity to connect with Microsoft product experts who will be on hand to answer your questions and listen to feedback. The AMA takes place entirely in the comments below. There is no additional video or audio link as this is text-based.
Feel free to post your questions anytime in the comments below beforehand, if it fits your schedule or time zone better, though questions will not be answered until the live hour.
- SriniTechBrass ContributorThank you for the AMA session/ It was a brilliant session. Maybe we could do this again sometime.
- Allan_CarranzaMicrosoftThanks for joining us - We love opportunities to connect with our wonderful community of developers!
- LonChenMicrosoftGlad to hear this is helpful, Srini. We will do it more often in the future. Stay tuned for our future events.
- Travis_Wilson_MSFTMicrosoftIt's been great talking with you! This is for sure just the beginning; we'll have more coming soon.
- EricStarkerCommunity ManagerWelcome to the GPT-4o-realtime API with Audio AMA! This live hour gives you the opportunity to ask questions and provide feedback directly to the Azure AI Platform team. Azure AI Platform powers and serves state-of-the-art OpenAI models to Azure developers, including Microsoft's own Copilot products. Please post any questions in a separate, new comment thread. To start, introduce yourself below and tell us where you're logging in from!Joining us today from the Azure AI Platform team are:
- Allan Carranza, Azure OpenAI Senior Product Manager
- Travis Wilson, Azure OpenAI Principal Software Engineering Manager
- Nick Brady, Azure OpenAI Senior Program Manager
- Long Chen, Azure OpenAI Senior Product Marketing Manager
- Travis_Wilson_MSFTMicrosoftHello, everyone! Travis here -- I'm the engineering manager, located in Seattle, whose team works on OpenAI and Azure OpenAI SDK libraries. Excited to be part of this AMA and thank you for your questions! From the code perspective, I'll plug the repository we set up to help get started with gpt-4o-realtime-preview audio features: https://github.com/azure-samples/aoai-realtime-audio-sdk . This is all a very new feature area and the resources we have are a rapidly evolving work in progress, but we're excited to see how quickly things are growing and the cool things people are already starting to do with the preview of the new /realtime features.
- Allan_CarranzaMicrosoftHi everybody!! We're excited to be here with the tech community to answer any questions about the new GPT-4o-realtime speech capabilities, including how you can get started with building your applications!
- Allan_CarranzaMicrosoft
In case you missed it, Azure AI Search built a RAG + Voice demo utilizing the GPT-4o Realtime API. Check out the blog post below that includes more details, including a code sample to get started with RAG + Voice!
VoiceRAG: An App Pattern for RAG + Voice Using Azure AI Search and the GPT-4o Realtime API for Audio - Microsoft Community Hub - SriniTechBrass Contributor
Question to Travis or Allan: 1. How can no-code developers best utilize the Copilot Voice product and GPT-4 in their applications? Are there specific integrations with no-code platforms (like Make, Bubble, etc.) that you recommend? 2. For a non-programmer, what is the setup process like to get started with GPT-4 in Azure? Are there any beginner-friendly resources or templates available? 3. Could you explain how the new multilingual features work in the real-time API? What steps are involved for a no-code developer to incorporate multiple languages seamlessly? 4. What are some specific examples of how no-code developers can use GPT-4’s real-time API for improving customer interaction experiences, such as through chatbots or voice assistants?
- Travis_Wilson_MSFTMicrosoftWhether you plan to write code or not, the process to get started with the new gpt-4o-realtime-preview model is fairly straightforward: (0) if you don't have one yet, create an Azure account via Azure Portal; (1) create an Azure OpenAI Service resource in one of the two preview regions (eastus2 or swedencentral); (2) using Azure AI Studio, create a gpt-4o-realtime-preview model deployment in your eastus2 or swedencentral; (3) use the "Real-time audio" playground (left navigation bar) to check out the new model with a live, browser-based voice-in/voice-out experience. From there, there are code samples -- including ones that just require setting environment variables and running -- at https://github.com/azure-samples/aoai-realtime-audio-sdk . We don't have much in the way of "build a new experience with no code whatsoever" yet given how new this all is, but we're continually looking for ways to make it easier to integrate this new /realtime feature set and other Azure OpenAI capabilities.
- nbradyMicrosoft
Hey Srini, thanks for the questions. Since we are the platform team serving these models,
(1) I'm sure the Copilot and Copilot Studio teams are cooking up something for no-code and low-code developers to take advantage of this new modality.
(2) For a non-developer, we have some helpful documentation to get started: https://aka.ms/oai/docs. Without code you can create a resource by using the Azure Portal and launching the OpenAI Studio once a resource has been created. For now, we'd recommend creating the resource in either East US 2 or Sweden Central Azure regions. Check out this quickstart guide for how to interact with GPT-4o using the Realtime API within the Studio. For more code-first development, you can check out these pre-made samples: https://github.com/azure-samples/aoai-realtime-audio-sdk
(3) Much like the text-based capabilities of LLMs, these models can natively interpret many languages other than English. As always, it is best to test GPT-4o's audio capabilities to ensure the model has the fidelity you need to meet your business requirements.
(4) This voice capability unlocks a new modality in interacting with applications where AI becomes the universal interface. I've found inspiration in the scenarios our customers we gave early access like Bosch and Lyrebird Health as well as the examples OpenAI demonstrated in their spring update.
- EricStarkerCommunity ManagerI see you've posted this question twice - it looks like a complete duplicate. Please let us know which of these threads you want us to answer these questions in - I'll delete the other one.
- SriniTechBrass ContributorEdited into two parts now, rather than in one batch.
- scastle15Copper ContributorIs there an SDK or set of API's to have a GPT-4o audio agent join a Teams call? Could multiple agents join a teams call? eg: a Project manager agent, a QA agent, and a few human developers all in the same Teams call? Is this possible today? If so, which SDKs/APIs would enable this use case?
- Travis_Wilson_MSFTMicrosoft
We don't yet have higher-level abstractions for Teams specifically, but between OpenAI and Microsoft we've started some client library coverage to expose the new capabilities of the gpt-4o-realtime-preview model and the /realtime API:
- The OpenAI .NET SDK (https://github.com/openai/openai-dotnet) (as well as the AOAI companion library, Azure.AI.OpenAI) already has early support for a client integrated into 2.1.0-beta.1
- Python has an early standalone client we're iterating on at aoai-realtime-audio-sdk/python at main · Azure-Samples/aoai-realtime-audio-sdk (github.com)
- JavaScript has an early standalone library: openai/openai-realtime-api-beta: Node.js + JavaScript reference client for the Realtime API (beta) (github.com) and we also one at aoai-realtime-audio-sdk/javascript at main · Azure-Samples/aoai-realtime-audio-sdk (github.com)
We've already seen developers prototype applications with multiple agents talking to people (and each other!) using the /realtime capabilities and the results are very cool. Very possible with the tools we have today!
- scastle15Copper ContributorExcellent, thanks for the reply, I'll take a look.
- SriniTechBrass Contributor
Question to Azure OpenAI Platform team: Question to Travis or Allan: 5. How does Microsoft ensure data security and privacy when using GPT-4 in business applications, especially for industries like healthcare or finance? Are there compliance certifications built into the Azure deployment? 6. For someone just starting with this technology, what are the cost implications of using the GPT-4 API on Azure? Are there recommendations for managing costs while scaling up usage? 7. Are there any upcoming features for GPT-4 on Azure that could further benefit no-code developers? What should we be looking out for over the next few months? 8. What are some common mistakes non-programmers might make when setting up or using GPT-4 with Azure? Any tips on how to avoid them?
- Travis_Wilson_MSFTMicrosoftFor getting started with no code, Azure AI Studio (https://ai.azure.com/) includes the new /realtime capabilities in its Playground experience (much like OpenAI's) and allows you to interact with the new gpt-4o-realtime-preview model in the browser without needing to write any lines of code -- it's also the best way to create and manage the model deployments on your Azure OpenAI Service resource. Low- and no-code development of /realtime-based experiences is an area we're actively looking into. The new API endpoint and its WebSocket-based capabilities are considerably more complex than the prior REST-based operations like /chat/completions, and making it as approachable and easy to integrate as possible (including with no code at all, where it makes sense) is a major priority for us when it comes to Developer Experience.
- nbradyMicrosoft
(5) Yes! Azure OpenAI Service is an enterprise-grade platform hosting the latest models from OpenAI. Check out our data, privacy, and security documentation for more information.
(6) Cost implications are use case dependent. Items that are likely to impact your costs include: Input/output audio ratio, average length of audio session, number of concurrent connections, deployment type, and throughput requirements. We recommend that developers start with small scale tests and development and evaluate costs before scaling to production.
(7) The "On your data" feature of Azure OpenAI is always a great way to get started quickly, then you can create a web app or a chatbot to share with your team or organization. Check it out!
(8) Common mistakes I've witnessed firsthand typically include:- not adding enough of your default quota to your deployment
- not spending enough time on refining the system message to specify instructions
- when building retrieval augmented generation (RAG) applications, folks don't spend enough time on their chunking strategies and investigating relevancy of the documents in the vector store.
I'm sure there are plenty of others you'll uncover as you begin your journey. Best of luck! - EricStarkerCommunity ManagerI see you've posted this question twice - it looks like a complete duplicate. Please let us know which of these threads you want us to answer these questions in - I'll delete the other one.
- SriniTechBrass ContributorBroken it down in two parts.
- CaptainAmazingCopper Contributor
Will this new feature be included in PowerApps and Power Automate or will this only be available in Azure OpenAI as an API for coding? Follow on from this is the an API syntax or reference site we can reference for some help on how to interface with the API? Are there any specific code or framework requirements to allow for this to be a part of the project we are working on? Finally what sort of costs are being planned for this (will it be subscription or per voice line or volume of data)?
Okay my bad I just saw the SDK discussion earlier. Any chance we can get a view of some of those examples mentioned?- Allan_CarranzaMicrosoft
As for pricing, we will be sharing more details very soon! Similar to other Azure OpenAI models, we will start with Pay-as-you-go pricing and introduce others like provisioned, batch, etc. over time. If there are other pricing models that would better help you scale your applications, we always welcome feedback!
- popoolaiogmailcomCopper ContributorIt is my pleasure to be part of this great session. I'm eagerly looking forward to it.
- ehgreywoodeCopper ContributorAwesome!!
- CaptainAmazingCopper ContributorSecurity - How safe is the LLM that sits behind all this awesome technology. For example, if I have a proprietary calculation to calculate something for a business and I want to use it in a query with Chat GPT or Copilot. Will that mean that this will now be available to others who might need the same calculation? Then, will Microsoft be looking at a "reduced" cost version of Copilot that would work only in Teams or only in Power Point? Finally, HAs there been any improvements on Copilot in Excel to interrogate basic standard data without tables?
- LonChenMicrosoftMicrosoft put security above all else. Security comes first when designing any product or service including LLM. Security protections are enabled and enforces by default, require no extra effort and are not optional. When you use Azure OpenAI Service, your prompts (inputs) and completions (outputs), your embeddings, and your training data, are all your own data, meaning your data are NOT available to others for any uses. As for pricing, please get the latest from your sales representatives or from our pricing page. As for your last question, we will relay it to our CoPilot team to follow up or stay tuned with the Copilot product updates.