Innovation

Google’s DeepMind Is Breaking Voice Barriers

Artificial intelligence (AI) is getting super-intelligent these days. While Amazon’s Alexa already doesn’t need your voice to function, Google now says its DeepMind unit has crafted a system for machine-generated speech that’s so smart it outperforms the existing technology by at least 50 percent.

This means your mobile phone will soon know what you’re saying the first time. Pretty revolutionary, right?

DeepMind, which was acquired by Google in 2014, is known for developing WaveNet, which is a type of AI called a “neural network” and is designed to mimic how the brain functions. DeepMind has been able to develop technology which can mimic human speech down to precise soundwave levels. Tests performed have already shown that, for both American English and Mandarin Chinese, human test listeners said the artificial voice sounded more natural than any other program out there today.

That said, WaveNet is still currently under par when it comes to recording actual human speech patterns.

But remember years back when Apple launched that robotic voice with the Macintosh computer? This is far removed from that, or even Siri, but it’s not perfect. DeepMind says the technology behind the fluency and natural voice work has evolved around large data sets of short recordings of one person speaking, and then, it’s combined with speech fragments and pieces to form additional, new words. The result? More natural speech — to an extent. Results seem to be intelligible but are not yet exact. Other systems have been focused on letter combination rules, which can be manipulated easier, true, but don’t sound as natural.

As for immediate commercial appeal and applications? DeepMind says there’s still somewhat of a waiting game. The holdup? The computational power required for the technology, which, as DeepMind researchers have admitted, “is a clearly challenging task.”

That said, DeepMind’s technology will certainly be something to watch. Investment in interaction through this type of communication has already been made by Amazon, Apple, Microsoft and, of course, Alphabet’s Google. Plus, Google Play’s international director says that 20 percent of mobile searches are made by voice. Can you hear that?

——————————–

Latest Insights:

Our data and analytics team has developed a number of creative methodologies and frameworks that measure and benchmark the innovation that’s reshaping the payments and commerce ecosystem. The PYMNTS Next-Gen AP Automation Tracker, is a monthly report that highlights the most recent accounts payable developments and automated solutions that are disrupting how businesses process invoices, track spending and earn rebates on transactions.

Click to comment

TRENDING RIGHT NOW

To Top