Why Mastering the Restaurant Carry-Out Menu Is the Key to Voice AI’s Future

Voice AI is smartening up to fulfill the promise of life without keyboards.

Many of us have marveled at the dumbness of answers that, say, Siri often responds with, but voice AI has reached an inflection point where its natural language understanding (NLU) capabilities are catching up fast to human speech. Listen for it at a restaurant near you soon.

That’s the takeaway — the takeout, if you will — from a conversation between PYMNTS’ Karen Webster and Keyvan Mohajer, founder and CEO at SoundHound, whose Voice AI with Dynamic Interaction™ is set to turn the world on its ear, starting with a restaurant rollout.

“Restaurants to us are like what books were to Amazon,” Mohajer said. “They started with books, now they do everything. We are going to do restaurants and then expand to other types of businesses. We picked restaurants because it matches our key strength, which is our ability to understand very complex conversations.”

That teases the Jan. 10 news that SoundHound has joined the Toast Partner Ecosystem, bringing its advanced voice ordering system to drive-thru’s, phones, platforms and kiosks, even answering complex menu queries, handling order changes mid-stream, and more.

Mohajer isn’t shy about the fact that his solution is taking on Siri, Alexa and Google speech recognition. But he says he was there first, and frankly, he’s got the better tech.

“We created it, patented it, and launched it before Amazon did, and before even Google was thinking about it,” he said. “We’ve had this vision for more than a decade, and we worked on it for 10 years. Obviously, they have more marketing power and they acted very quickly and over-invested in it, and now they’re trying to dial back a little bit.”

While Big Tech reviews investments in useful, ubiquitous but money-losing and often frustrating voice assistants, SoundHound is ready to take your order. Timing is propitious as restaurants are short-staffed and drive-thru experiences are stuck in the early aughts at best.

Voice of the Consumer

Restaurants were not SoundHound’s prime focus, but Dynamic Interaction™ has been so well-received that “we are making that our number one focus,” Mohajer said. “We can’t keep up with the demand because restaurants don’t have the staff to pick up the phone, but they don’t want to miss out on the orders. We take it over. We never miss a call.”

A truly intuitive voice AI can understand complex long-tail queries that usually go with food orders — the substitutions, special requests, even people abbreviating words (“mayo” instead of “mayonnaise”) which is crucial because “every restaurant has their own vocabulary, their own menu,” he said.

As part of the Toast Partner Ecosystem, SoundHound for Restaurants is one of three pillars of the company’s strategy for getting a better, smarter voice AI into more commerce use cases.

“We power devices like cars, TVs, RT devices, and then we power services in our pillar two, like restaurants. We aim to bring them together, and that’s pillar three. If you’re driving a car that is voice-enabled by our technology you can connect to restaurants that we voice-enabled and order food, and we generate new leads for the restaurants and eventually other businesses.”

The fact that SoundHound owns the of conversational intelligence tech, patent and all, stands to revolutionize the drive-thru experience, which could lead to a singularity in how we eat.

And getting restaurant orders right without human intervention is just the beginning.

Commercializing Speech

Noting that speech is the normal mode of communication for humans — we learn to speak at age two; keyboards come later — Mohajer told Webster that voice tech is poised for a breakthrough, not someday but this year, starting with SoundHound for Restaurants.

“More services will be available. You can do more things and there will be more endpoints. More devices will have voice. Eventually the mirror and the window, smart rooms, your car, your refrigerator, your TV, and wearable devices. Everything will have a touchpoint for voice input. That will happen in the coming years, not decades,” he said.

SoundHound already works with a roster of clients including Hyundai, Snap, TV maker VIZIO, LG Electronics, and Mercedes-Benz, to name a few. This spoken-word slam starts with specialized use cases like restaurants, and as consumers gain confidence in its ability to comprehend more advanced queries, it has integration potential throughout the connected economy.

That’s leading up to things like being able to have a voice-enabled device — say a coffee maker — tell you who won the game. “We have this vision of collective AI where every device can have an assistant, but you don’t need to teach everything from scratch,” he said. “If you want your washing machine to [answer] your questions about the weather you just enable the weather. It inherits that from the platform.”

It also beats Big Tech to the punch, which is a favorite topic of his.

“The world needs an independent player because companies like Google and Amazon hijack your brand and users and data,” Mohajer said. “Product creators hate it. In many cases, Amazon will compete with you if your product is successful. They’re not staying in their lane. There’s demand for an independent player, and that’s us.”

What’s next for this vocal proponent of a level playing field? TVs.

“For commerce opportunities that are visual and social, TV is the best channel to do it. The problem with TV is the inputs to the TV are still that decades ago, remote control, up and down. If you can fix that with voice input [you] can turn that TV into a kiosk at home.”