Like it or not, there’s no turning back: apps and operating systems will steadily gravitate toward voice-first interactions.
Not mandatory, but inevitable
But here’s the thing: none of the points I’m about to make mean that you will be forced to talk to your devices against your will, nor that humanity is mindlessly yapping its way into a future where every publicly shared space will inevitably become filled with a cacophony of overly dependent, AI-loving nerds.
The GUI isn’t going away, just as the calculator didn’t go away after the release of Lotus 1-2-3. In fact, even today, you can still buy an abacus if you’d like. Some are actually pretty expensive.
But at this point, it is downright inevitable that both app developers and operating systems will increasingly gravitate towards voice-based interactions.
And there are good reasons for that, the most obvious being accessibility.
By that, I don’t just mean users who can’t physically interact with their devices, although that alone is beyond fantastic. I also mean users who aren’t as tech-savvy as you might be, but who have the same needs, as they try to navigate phones, computers, and platforms that only seem to work effortlessly for everyone else.
And if your knee-jerk reaction is to perceive these users as lazy, or anything in that general direction, I’m sorry to tell you, but you’re missing the point of the entire promise of modern computing.
Tech advancements are supposed to lower the barrier to entry and help people get to where they want, regardless of how familiar they may be with anything ranging from the Terminal to Safari.
In fact, most of Apple’s existence was predicated on that very premise, even if its leadership occasionally seems to forget it.
Hello computer
All of that said, here’s another big reason why a voice-first approach is inevitable: the actual underlying technology required for that to work is finally getting good.
Yes, every single LLM still makes stupid mistakes, and it is likely that they always will, as long as they’re based on current autoregressive Transformer-based approaches.
But companies, frontier AI labs, and even indie developers are either learning to work around those limitations, or moving to entirely different architectures, some of which show great promise.
Over the past year, there has been significant progress in voice-based interfaces, including tools such as Wispr and Speechify, which have seen an increasingly steep rate of adoption.
According to Wispr Flow founder and CEO Tanay Kothari, his users eventually reach a point where voice accounts for roughly 75% of all input across the product. And among mature users, keyboard usage drops to under 5%.
And I’ll eat my hat if they’re not working on proper agentic capabilities to go alongside their dictation tools. In fact, Speechify is already clearly moving in that direction.
Also, let’s not forget the recent tsunami caused by OpenClaw, warts and all, which completely blew the roof off what anyone expected autonomous agents would be able to do anytime soon. In fact, many users are relying on platforms such as ElevenLabs to actually talk out loud with their agents, some of which saw the ElevenLabs API be proactively implemented by OpenClaw itself.
Anyone who knows what they’re talking about will tell you how remarkable this is, again, warts and all.
Evolution on that front is speeding up
And here’s how fast things are moving: I started writing this article a while ago, before OpenClaw became what it is today.
Originally, I had written:
“[…] it won’t be long before apps and operating systems lean on autonomous frameworks, where users just say what they want, and the AI handles the meaning, maps out the steps, and executes that action across agent-ready apps on the user’s behalf.”
As it turns out, it really wasn’t.
Originally, I also intended to close out the text by bringing up things like Anthropic’s MCP, as well Apple’s App Intents, to illustrate how the pieces that would enable voice-ready interfaces were falling into place. I was even going to suggest that we may see news on that front next June, during WWDC.
Now, while I still believe we might see more voice-oriented features, APIs, and affordances come June, even the notion that they will be developer-dependent is starting to look shortsighted or outdated.
Top comment by Mark
I disagree. So many places we exist, voice just doesn't work well. Can you imagine a coffee shop where even half the people are talking to their machines? How about a library? Then there are so many places that are noisy and I can't imagine shouting at your computer. Just give me a good keyboard. Can't replace a display either. Sure there are places that it would work, but even there, give me a good keyboard and nice display.
I may be misremembering the details, but I believe it’s John Gruber who talks about how somewhere, possibly at Drexel University, they eventually paved the path people carved into the grass because it was shorter than the route the architects had designed.
I sincerely believe that, for many users, voice is that shortest path.
From speaking a request into an iPhone or Mac and getting an advanced Shortcut in return, to tweaking photos, looking up and editing documents, or even requesting multi-step workflows across apps, it’s increasingly obvious that, as the tech finally catches up, the interface most users will find easiest to navigate is no interface at all. Or rather, the one humanity has been refining since the first grunt.
All of that said, I still hate it when people send me voice messages.
Accessory deals on Amazon
- AirPods Pro 3
- Beats USB-C to USB-C Woven Short Cable
- Wireless CarPlay adapter
- Logitech MX Master 4
- Apple AirTag 4 Pack
FTC: We use income earning auto affiliate links. More.

Comments