AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications

64 Comments

SriniTech
Brass Contributor
Oct 09, 2024
Question to Azure OpenAI Platform team: Question to Travis or Allan: 5. How does Microsoft ensure data security and privacy when using GPT-4 in business applications, especially for industries like healthcare or finance? Are there compliance certifications built into the Azure deployment? 6. For someone just starting with this technology, what are the cost implications of using the GPT-4 API on Azure? Are there recommendations for managing costs while scaling up usage? 7. Are there any upcoming features for GPT-4 on Azure that could further benefit no-code developers? What should we be looking out for over the next few months? 8. What are some common mistakes non-programmers might make when setting up or using GPT-4 with Azure? Any tips on how to avoid them?
- nbrady
  Microsoft
  Oct 09, 2024
  (5) Yes! Azure OpenAI Service is an enterprise-grade platform hosting the latest models from OpenAI. Check out our data, privacy, and security documentation for more information.
  
  (6) Cost implications are use case dependent. Items that are likely to impact your costs include: Input/output audio ratio, average length of audio session, number of concurrent connections, deployment type, and throughput requirements. We recommend that developers start with small scale tests and development and evaluate costs before scaling to production.
  
  (7) The "On your data" feature of Azure OpenAI is always a great way to get started quickly, then you can create a web app or a chatbot to share with your team or organization. Check it out!
  
  (8) Common mistakes I've witnessed firsthand typically include:
  
  not adding enough of your default quota to your deployment
  
  not spending enough time on refining the system message to specify instructions
  
  when building retrieval augmented generation (RAG) applications, folks don't spend enough time on their chunking strategies and investigating relevancy of the documents in the vector store.
  
  I'm sure there are plenty of others you'll uncover as you begin your journey. Best of luck!
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  For getting started with no code, Azure AI Studio (https://ai.azure.com/) includes the new /realtime capabilities in its Playground experience (much like OpenAI's) and allows you to interact with the new gpt-4o-realtime-preview model in the browser without needing to write any lines of code -- it's also the best way to create and manage the model deployments on your Azure OpenAI Service resource. Low- and no-code development of /realtime-based experiences is an area we're actively looking into. The new API endpoint and its WebSocket-based capabilities are considerably more complex than the prior REST-based operations like /chat/completions, and making it as approachable and easy to integrate as possible (including with no code at all, where it makes sense) is a major priority for us when it comes to Developer Experience.
- EricStarker
  Former Employee
  Oct 09, 2024
  I see you've posted this question twice - it looks like a complete duplicate. Please let us know which of these threads you want us to answer these questions in - I'll delete the other one.
  - SriniTech
    Brass Contributor
    Oct 09, 2024
    Broken it down in two parts.
CaptainAmazing
Copper Contributor
Oct 09, 2024
Wow, a voice enabled AI Event using text based chatting? From Copilot: Why did the text message go to school? To improve its grammar and become a better text! 😄
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  For what it's worth, I promise that my own typos will be 100% human-produced here 🤣
ryansusman
Copper Contributor
Oct 09, 2024
Thank you for scheduling this session. We have been experimenting with some of the sample code provided by Microsoft, and it appears to be functioning well. However, we have observed instances where the model generates music-like sounds, although it is not actual music but has a tune. Additionally, there are occasions when the model changes its voice. Could you provide guidance on how we should approach grounding the outputs?
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  Oh, I know exactly what you mean; the model can get pretty "creative" sometimes. It was even more entertaining a few weeks ago; one of its favorite pastimes was to start -- no joke -- giggling in the middle of a response. Much of this is getting rapidly improved within the model itself and is driven by continual new deployments. From a consumption perspective, you can use system messages ("instructions" inside of "session.update" with the /realtime API) and few-shot examples (conversation items with example input/output) to help prime the model for better output, just like you would with e.g. chat completions. This applies to even mundane things like retaining the same tone or voice -- responses should (and will) do a better job of not "getting distracted" all on their own, but gentle reminders surprisingly do assist, too.
scastle15
Copper Contributor
Oct 09, 2024
Is there an SDK or set of API's to have a GPT-4o audio agent join a Teams call? Could multiple agents join a teams call? eg: a Project manager agent, a QA agent, and a few human developers all in the same Teams call? Is this possible today? If so, which SDKs/APIs would enable this use case?
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  We don't yet have higher-level abstractions for Teams specifically, but between OpenAI and Microsoft we've started some client library coverage to expose the new capabilities of the gpt-4o-realtime-preview model and the /realtime API:
  
  The OpenAI .NET SDK (https://github.com/openai/openai-dotnet) (as well as the AOAI companion library, Azure.AI.OpenAI) already has early support for a client integrated into 2.1.0-beta.1
  
  Python has an early standalone client we're iterating on at aoai-realtime-audio-sdk/python at main · Azure-Samples/aoai-realtime-audio-sdk (github.com)
  
  JavaScript has an early standalone library: openai/openai-realtime-api-beta: Node.js + JavaScript reference client for the Realtime API (beta) (github.com) and we also one at aoai-realtime-audio-sdk/javascript at main · Azure-Samples/aoai-realtime-audio-sdk (github.com)
  
  We've already seen developers prototype applications with multiple agents talking to people (and each other!) using the /realtime capabilities and the results are very cool. Very possible with the tools we have today!
  - scastle15
    Copper Contributor
    Oct 09, 2024
    Excellent, thanks for the reply, I'll take a look.
Mary_Verrone
Copper Contributor
Oct 09, 2024
Will this be hard for people to adopt to this new and exciting change? What adoption advice would you give to Organizational Change Managers who come up with the strategy to drive adoption?

Oh and...
Hi from Florham Park, NJ! My Linkedin is http://www.linkedin.com/in/mary-verrone I’m a Change Manager, focusing on the people side of change projects.
- nbrady
  Microsoft
  Oct 09, 2024
  Hey Mary! Thanks for the question. I've found Jared Spataro's "AI at Work" videos to be useful inspiration for the organizational change guidance you're seeking.
  
  employees struggle with the confidence to believe they can successfully master this new AI technology,
  
  a lack of knowledge around which tasks are best suited to AI, and
  
  the skills needed to maximize AI's potential.
  
  My takeaways from the video were that leadership encouraging for their organization try AI and in-person trainings were most impactful. What do you think?
  
  Here was the latest video I'm referring to: https://www.linkedin.com/posts/jaredspa_ai-activity-7249443174991998976-dYGM?utm_source=share&utm_medium=member_desktop
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  Great question! I worked on voice assistants all the way back to the early Windows Phone days (*before* Cortana!) and it's surprising how much is "old" at the same time it's "new" -- if you're familiar with voice assistant paradigms and ever struggled with the technology just "not being quite there yet" for a lot of useful scenarios, then much of this will feel very familiar. Building a rich, voice-in, voice-out product experience can be very complex to get *perfect,* but can also be surprisingly quick to get to a min-viable-product, "already delivers quite a bit of value" state. I'd highly recommend just playing around with either the playground or demo applications to get a feel for what kinds of things it makes possible; everyone I've shown off even the interactive console demos too seems to ask it to do different things, and everyone walks away with a different idea of what they'd like to do with it. But everyone's walking away wanting to make something!
EricStarker
Former Employee
Oct 09, 2024

Welcome to the GPT-4o-realtime API with Audio AMA! This live hour gives you the opportunity to ask questions and provide feedback directly to the Azure AI Platform team. Azure AI Platform powers and serves state-of-the-art OpenAI models to Azure developers, including Microsoft's own Copilot products. Please post any questions in a separate, new comment thread. To start, introduce yourself below and tell us where you're logging in from!

Joining us today from the Azure AI Platform team are:

Allan Carranza, Azure OpenAI Senior Product Manager

Travis Wilson, Azure OpenAI Principal Software Engineering Manager

Nick Brady, Azure OpenAI Senior Program Manager

Long Chen, Azure OpenAI Senior Product Marketing Manager
- Allan_Carranza
  Microsoft
  Oct 09, 2024
  Hi everybody!! We're excited to be here with the tech community to answer any questions about the new GPT-4o-realtime speech capabilities, including how you can get started with building your applications!
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  Hello, everyone! Travis here -- I'm the engineering manager, located in Seattle, whose team works on OpenAI and Azure OpenAI SDK libraries. Excited to be part of this AMA and thank you for your questions! From the code perspective, I'll plug the repository we set up to help get started with gpt-4o-realtime-preview audio features: https://github.com/azure-samples/aoai-realtime-audio-sdk . This is all a very new feature area and the resources we have are a rapidly evolving work in progress, but we're excited to see how quickly things are growing and the cool things people are already starting to do with the preview of the new /realtime features.
SriniTech
Brass Contributor
Oct 09, 2024
Question to Travis or Allan: 1. How can no-code developers best utilize the Copilot Voice product and GPT-4 in their applications? Are there specific integrations with no-code platforms (like Make, Bubble, etc.) that you recommend? 2. For a non-programmer, what is the setup process like to get started with GPT-4 in Azure? Are there any beginner-friendly resources or templates available? 3. Could you explain how the new multilingual features work in the real-time API? What steps are involved for a no-code developer to incorporate multiple languages seamlessly? 4. What are some specific examples of how no-code developers can use GPT-4’s real-time API for improving customer interaction experiences, such as through chatbots or voice assistants?
- nbrady
  Microsoft
  Oct 09, 2024
  Hey Srini, thanks for the questions. Since we are the platform team serving these models,
  
  (1) I'm sure the Copilot and Copilot Studio teams are cooking up something for no-code and low-code developers to take advantage of this new modality.
  
  (2) For a non-developer, we have some helpful documentation to get started: https://aka.ms/oai/docs. Without code you can create a resource by using the Azure Portal and launching the OpenAI Studio once a resource has been created. For now, we'd recommend creating the resource in either East US 2 or Sweden Central Azure regions. Check out this quickstart guide for how to interact with GPT-4o using the Realtime API within the Studio. For more code-first development, you can check out these pre-made samples: https://github.com/azure-samples/aoai-realtime-audio-sdk
  
  (3) Much like the text-based capabilities of LLMs, these models can natively interpret many languages other than English. As always, it is best to test GPT-4o's audio capabilities to ensure the model has the fidelity you need to meet your business requirements.
  
  (4) This voice capability unlocks a new modality in interacting with applications where AI becomes the universal interface. I've found inspiration in the scenarios our customers we gave early access like Bosch and Lyrebird Health as well as the examples OpenAI demonstrated in their spring update.
- Travis_Wilson_MSFT
  Microsoft
  Oct 09, 2024
  Whether you plan to write code or not, the process to get started with the new gpt-4o-realtime-preview model is fairly straightforward: (0) if you don't have one yet, create an Azure account via Azure Portal; (1) create an Azure OpenAI Service resource in one of the two preview regions (eastus2 or swedencentral); (2) using Azure AI Studio, create a gpt-4o-realtime-preview model deployment in your eastus2 or swedencentral; (3) use the "Real-time audio" playground (left navigation bar) to check out the new model with a live, browser-based voice-in/voice-out experience. From there, there are code samples -- including ones that just require setting environment variables and running -- at https://github.com/azure-samples/aoai-realtime-audio-sdk . We don't have much in the way of "build a new experience with no code whatsoever" yet given how new this all is, but we're continually looking for ways to make it easier to integrate this new /realtime feature set and other Azure OpenAI capabilities.
- EricStarker
  Former Employee
  Oct 09, 2024
  I see you've posted this question twice - it looks like a complete duplicate. Please let us know which of these threads you want us to answer these questions in - I'll delete the other one.
  - SriniTech
    Brass Contributor
    Oct 09, 2024
    Edited into two parts now, rather than in one batch.
Hal_Hostetler
MVP
Oct 09, 2024
Good morning from Tucson, AZ.
shaziaiqbal
Copper Contributor
Oct 09, 2024
How to join live session todaty ?
- marvnl15
  Copper Contributor
  Oct 09, 2024
  same. I have the calander event, but no event link in there.
  - EricStarker
    Former Employee
    Oct 09, 2024
    Hello! The page you are on is where the event is! There is no audio or video or any additional link. It's all text-based, similar to a Reddit AMA, where you post your questions in text and get them answered in text live during the online hour.
    
    Just hang out until 9AM PT, or feel free to post your questions in advance at any time.
Garby
Copper Contributor
Oct 06, 2024
Alrighty. I look forward to this.

Event banner

AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications

Event details

64 Comments

Date and Time