Security & Fraud

Making Voice Biometrics Harder To Hack

Hackers, crackers and cyber criminals of all makes, models and specialties are more or less ubiquitous in 2018. There are many ways to catalog that reality.

There’s the 100 million or so verified cybercrime threats floating around the digital sphere, or the 86 percent of firms worldwide reporting at least one cyberattack in the last 12 months, or the $600 billion businesses that lose annually to cybercriminals, or the 1.4 billion consumer records that have been exposed to the black market in the last year alone.

Take your pick among data points – they all tell the same terrifying story about the ubiquity of cybercrime and cybercriminals.

Or if going by the numbers is a bit dark and depressing, one could also ponder the disturbing headlines coming from the nation’s various hacker conferences this summer.

At Def Con, we learned that an 11-year-old was able to hack a highly accurate replica of Florida’s election website in less than 10 minutes. We also learned that kids were turned loose to hack the sites, according to event organizers, because that task would be insultingly easily for adult hackers.

“These websites are so easy to hack that we couldn’t give them to adult hackers — they’d be laughed off the stage,” said Jake Braun, a former White House liaison for the Department of Homeland Security, in an interview with ABC News.

Or there was the news out of the Black Hat Conference that a research team managed to trick voice recognition software from Microsoft by convincing it a machine voice was human. And it wasn’t just the hack – it was how accessible and essentially easy it was to perform. John Seymour, a Salesforce senior data scientist, and Azeem Aqil, a Salesforce software engineer, were the dynamic duo behind the effort to “break voice authentication with minimal effort.”

“By breaking, we mean gaining access by impersonation. By minimal effort, we mean it shouldn’t require tons of computing — think desktop rather than server farm. It should finish in a reasonable time. And it should require little or no data science expertise,” Aqil told PC Magazine.

The team did manage – though notably, only by using the world’s most generous definition of “minimal effort.” In fact, to trick voice recognition, it seems the pair did an awful lot of work, some of it quite specialized.

But as Brett Beranek, director of security strategy for Nuance, noted in an interview, cybercriminals can be counted on to make more than minimal efforts when they are going after consumer data or access to their financial accounts. If there is a way to use a combination of data scraping and sound editing to convincingly and consistently fool voice-controlled systems like Cortana, Alexa or the Google Assistant, cybercriminals are going refine and enhance them.

Because they have every economic reason to do so – analysts estimate that over half (55 percent) of American households will be regularly interacting with voice-activated assistants by the year 2025.

Moreover, PYMNTS’ in-house data developed with Visa in our How We Will Pay study strongly indicates that an increasing number of interactions will be transactional in nature. Digital bankingcommerce, bill payments, food orders, transportation – all are emerging consumer touchpoints in the voice ecosystem, and all are serving to make the ecosystem an ever more appealing target for fraudsters.

And while fraudsters will always be part of the cost of doing business online or in real life,  we can get more sophisticated in how we spot them. When it comes to using biometrics, it’s not just about looking at what a customer has or what they do, but also about recognizing how it is being used, and whether it matches broadly with prior behavior.

It’s Not What You Say, It’s How You Say It

Voice printing is tricky work, Beranek said, and Nuance should know, as their customers keep about 300 million voiceprints on file for use in multi-factor biometric authentication. But as recent headlines demonstrate, he added, there are ways to spoof voice-printing – with enough time and imagination, and enough recorded sound from the target, and sufficiently good audio editing skills – which fraudsters will likely continue to try to perfect.

But even if one can get the sounds completely correct, consumers don’t just speak – they tend to speak in a certain way, at a certain cadence, using certain phrases.

Sounds can be copied, but speech patterns are harder to emulate, because they are much harder to discreetly identify. And, he noted, voice recognition and conversational pattern recognition cut against fraud in a secondary way, as fraudsters have patterns in their speech, too.

“We look for conversation patterns typical of fraudsters,” Beranek explained.

That pattern searching can be done in a voice authentication context, or when a fraudster is attempting impersonation fraud over the phone with a call center employee. Fraudsters, he noted, tend to inject urgency or panic into a conversation with the goal of creating stress. Real customers, on the other hand, are almost never interested in additional drama.

“There is a difference between individuals under actual stress dealing with an actual problem and someone imitating a consumer using data gained on the dark web that is going through the motions of stress to emotionally manipulate an employee,” Beranek pointed out.

Nuance’s software, he noted, is designed to “hear” the difference between stress real and feigned, and to direct the call center employee accordingly.

Focusing on What’s Hard to Spoof

Stealing a password is fairly easy. Stealing a voiceprint of a fingerprint isn’t easy, or necessarily practical at this point – but it’s at least provisionally possible, and that should give any consumer pause.

It’s not a reason to panic,  or to lose faith in the biometric authentication methods that are gaining prominence – but it is a reason to be aware that a biometric alone isn’t going to be the silver bullet for security in the digital age. Voice, fingerprint, facial ID, retina scanning – where someone is using a method to secure a transaction, someone else will try to work around it.

But security in combination and in layers tends to be very effective in spotting and repelling risk. Nuance, as its name implies, is mainly interested in the minute detail of behavioral biometrics: how a customer holds their phone, swipes their mousepad, strings phrases together and the like – because those things in aggregate would be very hard for a fraudster to correctly spoof all at once. Layered in with data authentication like IP address matching and bio-authentication tools like voice identification, it doesn’t become impossible for the system to be hacked – it just becomes very unpalatable to try.

But, Beranek noted, the cybercriminal then continues to develop their own increasingly sophisticated techniques of fooling the smart software designed to detect them. The key, particularly as voice-based commerce becomes more mainstream, is to keep building systems that are always getting smarter about seeing “real” consumers, as opposed to very sophisticated “synthetic” versions of them online.

“We’re definitely in an arms race,,” Beranek stated.


Latest Insights: 

Our data and analytics team has developed a number of creative methodologies and frameworks that measure and benchmark the innovation that’s reshaping the payments and commerce ecosystem. Check out our April 2019 Unattended Retail Report. 


To Top