When Machines Do The IDV Work Of People

Industries are abuzz with the fast-growing potential of machine learning — a form of artificial intelligence that allows programs to change, adapt and, obviously, learn from new data without requiring additional programming effort and human intervention.

While these intuitive computing capabilities look to innovate and automate repetitive, highly difficult pattern-matching problems across massive volumes of data, the focus of this latest podcast series is the role that machine learning will play in the realm of identity verification and fraud detection.

In this first installment, PYMNTS’ Karen Webster spoke with Sunil Madhu, CEO and president of digital identity verification company Socure, to get a sense of machine learning as a concept and what problems the technology can address in IDV.

Madhu said machine learning automates rote, repetitive and manual data-driven processes. Along the way, the machine can extract information from the data and identify patterns on a real-time basis. Patterns that human counterparts may not able to intuit.

People aren’t exactly known for their ability to process large amounts of data, noted Webster — in real time, at that. This real-time element is especially key in the fraud detection space.

From Manual to Machine

Before machine learning, Madhu said, the ID verification process involved a great deal of manually matching personally identifiable information (PII), like name, address and SSN, against non-reported data sources — utility bills, legal records, credit records, etc.

This process is largely obsolete today.

Security questions based on public information can easily be defeated by simply trolling for that information on the web. It’s all readily available, easily accessible and certainly not enough to verify one’s identity.

“There’s is a lot of information that can be culled from what we all leave online,” Madhu said. “If I needed to find your date of birth … to identify myself as you, I could troll your Facebook profile and look for a point in time when people wished you a happy birthday, infer how old you might be and then use that in an attack.”

But there are other types of data sources, ones which can’t simply be copied and repurposed for fraud, that machine learning can detect.

For instance, the social networks people belong to vouch for who they are. The fact that we are connected to other people at scale can reinforce a person’s authenticity, Madhu said, along with types of services people use based on demographics, market and geography.

Combining, matching and inferring meaning from this structured and unstructured data requires more than a single human can accomplish.

Today, Madhu told Webster, credit card applications at banks often run through an automated system to verify PII elements. If the system can’t resolve the data, manual review teams of hundreds of people type the applicants’ names into Facebook or LinkedIn to verify they’re real.

“A machine can do that work much faster and more accurately,” he said, “combining these different bits of data together to make inferences and derive knowledge that a particular person is, in fact, real and who they claim to be.”

Garbage In, Garbage Out

The biggest problem in machine learning is that the programs are both a human creation to begin with as well as a function of the data they assess.

“Garbage in, garbage out is as true of machine learning as it is of any computer system,” Madhu said. “It relates to the hardest problem in machine learning and data science in general: data engineering.”

The data engineering process is the heavy lifting in data science, he noted.

The machine will first learn the patterns it’s taught and then reinforce on those. It may detect new patterns and build on top of older patterns on its own, but if the original training and rules were flawed, inaccurate or biased, the machine output will be as well.

Likewise, if the data are no good, the same can be said for the machine’s insights.

However, we’re approaching a point, Madhu told Webster, where if we have enough understanding of the specific patterns we see that represent good data vs bad data, developers can then train machines to ensure that data is trustworthy. This can also work to minimize the intrusion of human bias and error into the system.

“We’re getting to a point where we can automate data engineering,” he said, “which traditionally has been the predominant amount of labor that goes into model development.”

In the Case of Fraud

Once machines are reinforced and able to weed out meaningless and poor information, it comes time to put them to work.

To apply machine learning to fraud detection and prevention, the objective is to enable machines to leverage collective intelligence insight on the types of fraud to which specific individuals, businesses and their peers may have been exposed.

“Ten years ago, fraud tools may have been deployed behind a firewall at a particular enterprise and exposed to just that enterprise’s experience of fraud,” Madhu said. “Today, they typically sit inside the cloud, securely, and they learn across customers at scale.”

Collective intelligence is one of the key differentiators in the way fraud detection tools are designed today. Access to industry data across businesses gives fraud tools that leverage machine learning efficacy, power and insight.

In this way, machine learning is an important development in such a mission critical area: having certainty around the authenticity of a person’s identity.

The specifics surrounding this topic, from how to ensure accurate data to the quantitative benefits of machine learning in the space, best practices for the role of feature engineering to what the future holds will be the subjects of upcoming podcasts in this series.