Event banner
AMA: GPT-4o Audio model revolutionizes your Copilot and other AI applications
Event details
I followed the links in the email to install the Android app. It does not match your claims (refuses to state what AI model version it's using, but does say that it's NOT GPT-4). The app is also not really ready for release IMHO: it's truncating the end of spoken sentences (missing the last word), and locks up if you go from spoken to text and then back to spoken. The login process has a weird loop that's super-confusing, and voice-input has been disabled for login for no sensible reason.
As a heavy ChatGPT user - the AI seems basically useless in comparison. It refuses to answer basic questions with excuses like "I can't help you with business advice", and does not understand how to follow instructions at all (e.g. tell it to stop asking you questions at the end of every response).
Your email says "be part of the transformation!" - so I asked it how I can make my value-added services available to users ... and it basically says I cannot.
What exactly does "be part of the transformation" mean? I don't want to add CoPilot to my service - I want my Service to be available to CoPilot users. Is that going to be possible?
- Travis_Wilson_MSFTOct 09, 2024
Microsoft
Allan and I are part of the AI Platform team working on the Azure OpenAI Service capabilities (Copilot is a same-company internal customer of the same capabilities now available to everyone) and thus we're not the best people to comment on Copilot app specifics. I will beg a bit of patience with any rough edges around anything related to the technology, though -- we just simultaneously released this gpt-4o-realtime-preview feature set (the beta /realtime API endpoint) with OpenAI last week and I can vouch for things continuing to change *very* quickly; I'm still just astounded that so many cool experiences were made possible so quickly with underpinnings changing so rapidly! As far as a "transformation" goes: flashy wording aside (hey, it got some attention), there really *is* some amazing potential that this kind of voice-in, voice-out interaction paradigm opens up. When voice assistants first became popularized, many people were understandably disappointed with how "on rails" and ultimately limited some of the capabilities necessarily ended up being, given the constraints in the technology: handling truly natural speech (including interruptions, so-called disfluencies like "ums" and "ahs", speaker variations, etc.) was hard, interactions still felt very "walkie-talkie-like" in how transactional and turn-based things were, you still felt like you were choosing from a short menu of things the assistant was good at, and so on. This new /realtime capability set built around gpt-4o-realtime-audio breaks through a lot of those barriers--I've had several people I've demoed to remark that they didn't believe it wasn't actually a pre-recorded person answering the questions and/or that it wasn't a person replying to them, even when they were trying it live, given how natural the experience felt. Aside from white-lie flattery, nobody ever *really* said that about voice assistants before. Now, that isn't to say that everything's absolutely perfect yet -- this is a beta/preview feature area, after all! -- but even trying it out on the playground or demo apps (or seeing it in action inside of Copilot, OpenAI's Advanced Voice Mode, etc.) really gives a sense for it isn't an unreasonable exaggeration to call this all "transformative."