The Voice Ecosystems’ Emerging Security Threats

There’s good news and there’s bad news this week in the world of securing voice-activated, artificial intelligence-powered platforms.

First, the bad news: There are two new, clever ways to exploit weaknesses in both the Amazon Echo and Google Home line of products that make it possible for hackers to eavesdrop on users — and to potentially trick them into giving up sensitive personal data like payments credentials.

The good news? The problem was discovered by white hat hackers, apparently before their black-hatted counterparts discovered the weakness.

The techniques are known as “voice squatting” and “voice masquerading” and were discovered and disclosed in a paper by researchers from Indiana University at Bloomington, the University of Virginia and the Chinese Academy of Sciences.

The hacks rely on making malicious skills and adding them to the skills marketplaces that both lines of devices use. Those marketplaces are where legitimate third-party skill developers place their app offers, which are accessible to the user through specific voice prompts.

Once opened, malicious skills can do all kinds of things that users don’t want — like record them (even when they think the app has been switched off) or even initiate legitimate apps in an effort to trick consumers into giving up private information, like card or account numbers.

How to get users to open those malicious skills?

In a word: homophones.

Voice Squatting

Though Alexa and the Google Assistant have made strides in understanding human speech, the voice squatting hack relies on the fact that both systems still have a ways to go when it comes to all the nuances of inflection in spoken English.

Hence, it’s possible to give a malicious app a name that sounds an awful lot like a mainstream and popular app and rely on Alexa’s limited “hearing” ability to pick the wrong app when the user says the command for the legitimate app. According to the paper, once the similar-sounding app is loaded into the skill market, Alexa is prone to open the wrong skill about 50 percent of the time. Google Assistant also had a 50-50 fail rate when it came to distinguishing between the app the consumer was asking for and the attack skill with a similar-sounding name.

In one instance, the researchers created an attack skill called “rap game,” mimicking the popular legitimate “rat game” skill. Users were asking for the “rat game,” but in many cases, Alexa opened the “rap game” skill instead.

In a second iteration, the researchers took advantage of a user’s tendency to be polite to the voice assistant. They created a skill called “rat game please,” which had the predicted effect: Polite Alexa users were directed to an imitation “attack skill” built by the white hat hackers.

This particular chink in Alexa’s security armor is particularly troubling, considering Amazon — in an attempt to not create a generation of rude children who speak only in imperatives — has recently taken to rewarding young users for saying “please” when making a request.

While a fake version of the rat game may seem pretty harmless, some of the fake skills the research team got past the system were downright worrisome.

“One may say, ‘Alexa, open Capital One, please,’ which normally opens the skill Capital One, but can trigger a malicious skill ‘Capital One, Please’ once it is uploaded to the skill market,” the paper stated. Sneaking the skill name “Capital Won” into the market was also a successful way of getting users to unwittingly enter another app entirely.

Luckily, because these “attack skills” were created by white hats, they contained no malicious content. The researchers did not avail themselves of the opportunity to collect customers’ private card data from their fake Capital One skill.

A black hat hacker, on the other hand, would have created the skill for that express purpose.

The Spy on the Counter

“Squatting” was only the beginning of the exploits the team uncovered in their research effort. They also conducted a series of “voice masquerading attacks.”

Voice masquerading occurs once a user has entered a malicious skill and attempts to leave and the app lets them think they did. Except, not really: What the malicious skill has actually done is begin to impersonate the application the user was seeking.

“These skills, once impersonated, could cause serious information leaks to untrusted parties,” the researchers wrote in their paper.

They also used these attack skills to add a secret recording functionality via both platforms “reprompt” feature.

The legitimate reason for this function is to allow a skill to keep running, even when it does not receive a response from the user. To use that function, a skill must issue a notice to users that it is still running via an audio or text file.

The research teams managed to skirt that requirement by creating a long, silent audio file as a reprompt so the user wouldn’t receive any audible warning that the mic on the device was still recording.

Researchers were able to keep their malicious skill recording for 102 seconds on Alexa and 264 seconds on Google.

And those, results, the researchers noted, were the very low end of how long the speakers could record. As long as the user’s voice was within range — whether or not he or she was directly interacting with the skill — it could record indefinately.

Those techniques were similar to ones demonstrated in a previous skill-based attack by researchers from Israeli firm Checkmarx in April.

Patching the Holes

The researchers first directly disclosed their findings to Google and Amazon. Both companies have reaffirmed their commitment to purging bad skills from their marketplaces, and both noted they already have practices in place to prevent them.

“Security is an ongoing focus for our teams, and we constantly test and improve security features for all Google Assistant devices, including voice-activated speakers,” a Google spokesperson said. Google further noted that it does ban some words from skill titles — like “launch” or “ask” — but “please” did not make the initial cut.

That situation will soon be rectified, Google reportedly told Forbes.

“Customer trust is important to us, and we conduct security reviews as part of the skill certification process. We have mitigations in place to detect this type of skill behavior and reject or remove them when identified,” Amazon relayed through a spokesperson.

The research team said it met with Amazon in April after their initial report on the issue was distributed to Amazon in February of this year. They also reportedly presented the company with their technique for preventing vocal squatting attacks by automatically comparing pronunciation similarities between two skills.

The research team, however, is uncertain on what, if any, steps the tech firms have taken to remediate the issue.

“We know that Amazon and Google couldn’t defend against the attacks when we reported our studies to them, and we are not sure whether they can do that now,” said XiaoFeng Wang from Indiana University.