How Consumers Want to Live in a Conversational Voice Economy

Consumers live their lives in a digital-first, connected world — habits formed after three years of a forced shift to digital. And they like it that way.

But that’s today.

In a March 2023 landmark study, PYMNTS asked 2,939 U.S. consumers about their use of voice and its future: in five years’ time, nearly 50% of consumers believe their connected world will be enabled by smart assistants that are merely a spoken word away.

Nearly six in 10 millennials say that too.

Figure 1: When voice will be as smart and reliable as a human

Getting there requires better voice tech, more consumer trust, more innovation, more interoperability, and probably a player or two whose identity is yet to be known to shake things up, forcing new thinking and new business models and new consumer experiences.

What’s certain from this research is that consumers seem ready to move from taps and swipes to smart and reliable voice tech that’s integrated into their everyday routines. This reality isn’t the stuff of science fiction, but the evolution of the voice foundation we have in place today.

Some 168 million U.S. consumers already use that voice foundation to conduct at least six different everyday tasks like getting information, turning TVs and appliances on and off, and ordering food, groceries, or Ubers.

The Commerce Innovation That’s 200,000 Years Old

The voice economy isn’t new. In fact, voice as an enabler to just about any activity, including commerce, is about 200,000 years old. That’s when the human anatomy developed sufficiently to enable sounds and words to be spoken. Fifty thousand or so years later, the first origins of language were documented, enabling knowledge to be shared and become the basis for more meaningful and contextual human interactions.

For millennia, voice was the only way that people interacted, and people and businesses transacted. Payment form factors may have evolved from shells to coins to paper currency, checks and plastic cards, but people mostly walked into a store, asked the shopkeeper to show them something, negotiated the price and then left with their purchase.

Voice was ubiquitous — every business could take an order or respond to a request using one. It was secure — version 1.0 of the human as biometric authentication method. And it was personalized — consumers and merchants and business executives could banter back and forth before settling on precisely what was needed and at what price.

The advent of the digital age, beginning with the launch of the commercial internet in the mid-1990s and continuing to mobile phones and apps in the 2010s, gave people and merchants new ways to engage that were far less dependent on walking into a physical store and talking to a salesperson. Keyboards, taps and swipes could replace voice-to-human —either in person or via a call center — to make a sale, get information, do banking, manage financial transactions, and make payments.

Throughout the decade of the 2010s, innovators would use data, the cloud and digital payments to expand access to commerce and its many possibilities. And consumers were ready to take whatever digital innovations came their way. By 2021, 90% of households had access to the internet and at least one mobile device.

By 2022, U.S. households had, on average, 22 connected devices: smartphones, TVs and appliances, voice-activated speakers and cars with voice-enabled capabilities. Digital payments and mobile wallets were capable of igniting commerce throughout these digital channels anytime, anywhere and with anyone.

Commerce and everyday life became truly mobile. And the voice economy green shoots began to emerge.

The Connected Economy Takes Shape

Since 2019, these internet-enabled devices and apps have fast tracked the connected reality of consumers, shifting the focus of innovators from mobile and standalone apps to connected devices and ecosystems that make access to multiple activities more simple and more streamlined.

PYMNTS has been tracking the digital behaviors of U.S. consumers every month since March of 2020. We measure the digital transformation by examining the activities once only conducted in the physical world in three ways: the number of consumers using digital to conduct at least one activity, the number of digital activities that any one consumer engages with and the frequency with which they engage using digital over time.

We find that the digital transformation of the U.S. economy — and that of the world — is undeniable and indelible.

Today, consumers go about their daily routines in a connected world, where 28 of 70 routine activities are now digital to some degree — up 13 percent from last year. Fourteen percent more occasional activities such as shopping in marketplaces and ridesharing are now weekly; 24% of weekly activities such as music streaming and reading the news online are now daily.

Figure 2: Digital activities consumers engage in on a monthly basis


Network effects — the flywheel — are starting to spin, boosting digital engagement across the continuum of routine activities. We observe that for every 10% increase in shopping online, buying groceries online increases by 7%, using digital channels for health and wellness increases by 6% and banking using digital channels increases by 4%.

Embedded payment possibilities have sparked the creativity of business leaders and entrepreneurs to design new business models, experiences and commerce opportunities. Even as the physical economy opens, physical remains the least satisfying shopping and payment experience for consumers in the U.S. and around the world.

Over the last three years, we have seen a 40% increase in online grocery orders and a 39% increase in online food shares. In 2019, less than 2% of grocery orders were done online — today that stands at 12%. People still shop in grocery stores, but not for the same things they did when that was the only way to purchase their food.

More recently, we’ve seen players as diverse as PayPal, Amazon and Uber make it possible for many similar activities to be connected inside of a single digital ecosystem where payments and identity are inextricably linked. A single app for these everyday experiences is something that nearly half (48%) of consumers say they like and will use — as many do already — because it’s convenient and more secure than having their identity and payments credentials stored all over the web.

Hold that thought.

It’s a blueprint for how the voice economy will likely evolve.

Say Hello to the Consumer’s Voice-Powered Future

But in 2023, commerce is on the cusp of another digital commerce breakthrough — one that will take us back to the future in many ways. One where voice is the enabler of routine interactions between people and businesses and commerce transactions, in real time and securely.

Tomorrow, powerful technology will integrate voice and enable commerce from any connected device — some 75 billion of them, it’s estimated, by 2030 when 5G becomes 6G — positioning voice to be as ubiquitous as it was 200,00 years ago. Computer power will increase rapidly while sitting in the cloud, and smaller, faster chips will enable smaller, embedded devices.

Artificial intelligence will make voice interactions smart, personalized, adaptive and engaging. As in truly engaging — conversational in every sense of the word. Not just reactive to a wake word and a series of prompts, but proactive and intuitive, anticipating actions based on history and context and anticipating what consumers might want to do next, just like an effective, smart, capable human assistant would do.

Without holidays, sick days or vacations.

Integrations with screens — in a car, at home, on a mobile phone, a TV screen or one of the many displays that will surround us — will add a visual dimension to the voice experience.

Voice biometrics will make those experiences secure.

Like the internet and app economy, entrepreneurs and business leaders will build apps and apps ecosystems to connect devices with AI-powered experiences. Embedded and tokenized payments and identity credentials will make it secure. The consistency and security of those experiences will build trust.

In that world, voice will displace taps and swipes in many situations, making it faster and easier for consumers to get information, buy something or make a reservation. PYMNTS research finds that nearly half of U.S. consumers do believe that in five years’ time, a smart voice assistant will be effective at managing the cascade of changes required when there is a sudden change in their plans. More than half believe it will helpful in handling issues related to emergencies that happen when they are driving.

Nearly 30% of U.S. consumers say they’d even pay a monthly fee to access a voice assistant that can do that.

Today some 100 million consumers use their voice to talk to assistants built into their mobile phone (39%), an app on their smartphones (25%) and connected devices like speakers (27%) to complete those activities or connect with call centers. A smaller portion of consumers access voice built into their cars (15%) and wearable devices (12%).

Table 1: Different ways consumers have used voice technology in the last 12 months

But barely one in ten of all consumers think that voice is capable of being their smart, everyday sidekick today.

Houston, We Have a Voice Problem

For voice to become the future that many consumers say they want, voice economy enablers must prove it can be trusted — getting consumers over the risk/complexity conundrum that prevents all but the bravest voice pioneers from conducting complex activities where the downside of making a mistake could be disastrous.

After all, it’s not the end of the world if Google or Alexa tells you a bad joke. It could be if someone using a deep fake version of your voice authorizes taking $10,000 from your bank account.

The voice experience also needs to be interoperable and consistent — the absence of both creates a bad user experience and too many stutter-steps between devices and apps and operating systems that consumers say wastes time without the certainty of a good outcome.

Right now, more Apple users than Cupertino would like to admit walk into their homes and talk to Alexa to do a variety of things like make a grocery list, confirm the replenishment order, lock the back door, call Mom or Dad, or get the next day’s weather forecast.

As car OEMs take back control of the cockpit and throw BigTech out, voice inside the vehicle will become a standalone experience — separate from the smart assistant that opens their garage door, turns on the lights, builds grocery lists and places an order, checks who’s at the door, orders a pizza for delivery and an Uber for a ride to the airport the next morning.  What may sound like a great idea for the car brand may very well end up being a bad experience for a consumer who doesn’t want one more digital environment to manage.

Voice-enabled speakers are only as smart and useful as the apps they are integrated with and the datasets they are trained on. Some voice providers operate only in certain categories, like restaurants, or power voice for only specific brand apps — or only work when a car pulls up to the drive thru window.

There’s also a lot of confusion among consumers about what a voice-enabled experience is, and therefore who’s really delivering it.

Is using my voice asking Siri to dial my bank so that I can talk to a person and get my banking or payments handled a voice-enabled experience or a hands-free version of a mobile phone call?  More than half of consumers who said they used voice to open an account or make a payment really just used their voice to ask Siri to dial their bank, where they spoke with a real person.

The result today is voice ecosystem — and voice experience — that is a hodgepodge.

The Voice Economy Playing Field

Who enables the smart, connected-voice experience that 168 million consumers would like to have? That is a work in progress.

Today most consumers think it will be the BigTech player whose handset and operating system are in the palm of their hand, since is how they use voice today. Apple and Google are almost tied for the top spot at 46% and 41%, with Amazon third at 35%. Interesting, perhaps, is that even those who use Amazon apps and speakers today think Apple and Google will dominate.

Figure 3: Who do consumers trust to deliver voice technology that is as smart and reliable as real people?

That’s largely because consumers today think of voice as a feature they use on their phone.

But that may not be how things end up.

Like the mobile economy has evolved, the voice economy will likely come down to a few dominant operating systems with apps that are interoperable across devices.

There could be new generative AI-powered plus voice-focused OS — or there could simply be new versions of iOS and Android. I think here is an opportunity for new operating systems to flourish such as Alexa, which already has hooks into more than 100,000 connected devices and has been sitting in people’s kitchens and telling them corny jokes for nearly a decade.

Regardless, innovation will happen at the application level, and innovators will compete to create novel experiences for consumers. Voice operating systems and platforms will have embedded identity and payments credentials at the core. They will simplify the complexity of the hodgepodge that is the voice experience today.

But unlike the mobile economy, voice standards will enable a consistent way for any type of connected device to integrate to any voice operating system, making voice an integrated, ambient part of the environment in which consumers live.  Mobile devices will remain important, but over time become less dominant as the consumer’s voice interface. Any device will be capable of initiating commerce and accessing information anywhere a consumer is and at any time with any device.

That’s what makes the modern-day version of the voice economy so thrilling.

And the voice economy winners will be those who can operate cross-platform, cross-device — with commerce or the potential to embed commerce into the operating system a key requirement.

Whether it will be one of the BigTech players that have a strong voice presence today or someone who’s working under the radar remains unclear as things stand now. Lots of things can happen in the next five years.

The sudden emergence of Open AI and Chat GPT has stunned the world with its powerful generative AI engine and massive adoption unlike anything seen in modern times, serving as a reminder that players who can change the world may be hiding in plain sight — until they suddenly aren’t.

In fact, 7% of consumers also believe that the voice innovator is someone whose name we don’t yet know.

What’s Next

Consumers have bought into a future where voice and AI are integrated into their day-to-day — ambient and always on — largely because voice is not an entirely new consumer technology.  Amazon’s Echo will be ten years old next year. Siri has been part of the Apple OS since 2011. Hey Google has been around since 2016. Consumers use their voice to order at drive thrus. Chatbots proliferate across banking, retail and financial services. People have been talking to call centers ever since there was a telephone and a customer service department.

They may find today’s experience a somewhat disappointing jumble at times, but they see its potential for making the connected world in which they live right now even more connected, more convenient and more secure. More conversational.

Voice pioneers will lead the way. The 44 million Americans who already trust voice enough to transfer money or open a new bank account will endure the frictions and push the voice economy ecosystem to get better.

Before you roll your eyes and say, “What do consumers really know,” consider this. PYMNTS research finds that these are the same consumers who predicted COVID would last until 2022 well before epidemiologists did — and have said repeatedly that inflation will persist until late 2024 well before analysts and the Fed acknowledged that a recession is coming, and 2% inflation targets remain elusive.

Consumers have a track record as reliable predictors of what’s likely to happen and when.

That’s because they’re pushing every facet of the economy to meet them where and how they want to shop, buy, pay, have fun, work, travel, stay well, communicate with others and live in their homes.

Like the voice economy of 200,000 years ago, the modern-day voice economy will be ubiquitous, personalized, easy to use and reliable.

Like every other innovation in the digital age, it is the consumer who will bring us into the future and shape it, increasingly, into one where voice becomes an important part of how they live in a connected, digital-first economy.

Like the internet and the app economy, innovators will lay the tracks and correctly anticipate what consumers want.