Summary
Who needs text-based prompts when you can simply talk to your favorite AI? Voice interaction is the hot new feature that developers are scrambling to add to their models, with ChatGPT’s Advanced Voice Mode, Copilot’s Natural Voice Interaction, and Gemini Live leading the way.
Chatbots are Growing Fast
It’s been less than two years since the debut of ChatGPT, and we’re already witnessing AI chatbots undergo a fundamental change in the way they communicate with humans. As these models have rapidly evolved and gained multimodal capabilities, they are no longer bound strictly to text-based prompts and replies. Today, they can converse with you as you would another person and, in Gemini Live’s case, do so in more than 40 languages. Obviously, traditional written prompts still have their place—I mean, nobody’s sitting down and dictating thousands of lines of Python code to a chatbot—but voice interactions and conversational AIs are poised to further revolutionize how we interact with the modern world.
OpenAI was the first to bring the technology to market with Advanced Voice Mode, but was quickly followed by Google’s Gemini Live and, more recently, Meta’s Natural Voice Interactions. Each system offers its own unique set of capabilities and constraints. This guide will help give you the information and insight you need to choose the best one for your specific needs.
ChatGPT Advanced Voice Mode
ChatGPT’sAdvanced Voice Mode(AVM) leverages OpenAI’s latest large language model, GPT-4o, to facilitate more natural, back-and-forth conversations with you, the user. This makes it ideal for tasks that require real-time interaction, such as brainstorming or discussing complex topics. And, since it has GPT-4o under the hood, AVM is capable of competently discussing a wide range of topics, from biochemistry to 14th century Japanese philosophy. What’s more, it can provide in-depth responses on those topics where other AIs will provide brief summaries. Personally, I find that it offers a strong combination of natural language understanding, adaptability, and personalization, alongside a broad knowledge base.
Gemini Live
While Gemini 1.5 Procan’t post the same benchmarksas GPT-4o, it does offer a host of capabilities that AVM does not. I cannot overstate this, it’s free to use through either the Google app or the dedicated Gemini iOS and Android apps. There are no region restrictions for it as there are AVM. The only place you’re able to’t get Gemini Live is on the desktop, though Google is reportedly working on adding that capability in the future. Gemini Live is currently available in five languages beyond English: French, German, Portuguese, Hindi, and Spanish, and will expand to nearly four dozen languages in the coming weeks.
Copilot Voice
Copilot Voice is one of a host of new features thatrecently debutedalongside the revampedCopilot personal interface, which runs on a custom instance of GPT-4. Like AVM and Live, it enables you to converse naturally with the AI instead of typing out your queries. Like the others, Voice is primarily designed to answer general questions and act as a digital assistant, though because it does operate atop GPT-4, it has access to that model’s expansive training corpus. And unlike Live, Voice is available through the Copilot desktop portal.
Microsoft bills itas “the most intuitive and natural way to brainstorm on the go, ask a quick question or even just vent at the end of a tough day.” Because who needs real friends when you can just yell at your pocket computer on the subway ride home?
It is free to use, unlike AVM, though it is currently limited to conversations in English, and only if you live in Australia, Canada, New Zealand, the United Kingdom, or the United States. Microsoft is working to expand both the feature’s language capabilities and geographic availability in the coming weeks.
Which Voice AI Is Right for You?
If I were a Windows guy, I’d be more likely to use Voice, if only to minimize potential friction points with the rest of the apps I already use. If I ran iOS, well, I’d be patiently waiting for Apple Intelligence to arrive with its AI-enhanced and supremely upgraded Siri. If you, on the other hand, actually need the lake-boiling inference capabilities and performance that ChatGPT provides, and have $20 burning a hole in your pocket, Advanced Voice Mode is probably the way to go.