Innovation

Google’s DeepMind Is Breaking Voice Barriers

Artificial intelligence (AI) is getting super-intelligent these days. While Amazon’s Alexa already doesn’t need your voice to function, Google now says its DeepMind unit has crafted a system for machine-generated speech that’s so smart it outperforms the existing technology by at least 50 percent.

This means your mobile phone will soon know what you’re saying the first time. Pretty revolutionary, right?

DeepMind, which was acquired by Google in 2014, is known for developing WaveNet, which is a type of AI called a “neural network” and is designed to mimic how the brain functions. DeepMind has been able to develop technology which can mimic human speech down to precise soundwave levels. Tests performed have already shown that, for both American English and Mandarin Chinese, human test listeners said the artificial voice sounded more natural than any other program out there today.

That said, WaveNet is still currently under par when it comes to recording actual human speech patterns.

But remember years back when Apple launched that robotic voice with the Macintosh computer? This is far removed from that, or even Siri, but it’s not perfect. DeepMind says the technology behind the fluency and natural voice work has evolved around large data sets of short recordings of one person speaking, and then, it’s combined with speech fragments and pieces to form additional, new words. The result? More natural speech — to an extent. Results seem to be intelligible but are not yet exact. Other systems have been focused on letter combination rules, which can be manipulated easier, true, but don’t sound as natural.

As for immediate commercial appeal and applications? DeepMind says there’s still somewhat of a waiting game. The holdup? The computational power required for the technology, which, as DeepMind researchers have admitted, “is a clearly challenging task.”

That said, DeepMind’s technology will certainly be something to watch. Investment in interaction through this type of communication has already been made by Amazon, Apple, Microsoft and, of course, Alphabet’s Google. Plus, Google Play’s international director says that 20 percent of mobile searches are made by voice. Can you hear that?

——————————

LIVE PYMNTS ROUNDTABLE: MODERNIZING & SCALING FOR THE NEW NORMAL

The pressure on banks to modernize their payments capabilities to support initiatives such as ISO 20022 and instant/real time payments has been exacerbated by the emergence of COVID-19 and the compelling need to quickly scale operations due to the rapid growth of contactless payments, and subsequent increase in digitization. Given this new normal, the need for agility and optimization across the payments processing value chain is imperative.

Click to comment

TRENDING RIGHT NOW