Consumer Authentication

Using Online, Social Data To Make ‘Thin Files’ Thick

Can the data trapped in “Digital Exhaust” – like online and social media data – be used to validate identity and predict fraud? Socure sure thinks so. And it has the data model to prove it.

Suppose you could build a model that would help financial institutions of any kind fill in those gaps with online and social data that could make thin files thick without increasing their risk?

Socure did that experiment. Here is how it unfolded.

Step One: Correlate a consumer’s participation in various types of social networks (traditional, blog-based and professional) and their associated fraud risk.

Socure found that while an individual who belongs to no social networks brings a fraud risk of 22.9 percent, that number drops by 5.7 percent when a consumer participates in all three types.


Step Two: Demonstrate the effectiveness of a fraud model that relies upon social data based on a supervised learning approach to identity verification.

In the company’s research, three data sets were constructed to test the veracity of online and social media data as a means of authentication:

  • Real Data — The control group, consisting of 10,000 real U.S. consumers who were identified using names, addresses, phone numbers and dates of birth (DOB).
  • Synthetic/Fake Data — Another 10,000 identities, in this case all fake (generated automatically using an online tool), were used to simulate what a fraudster could make up. These synthetic identities were created using the attributes of name, DOB, email (with a valid domain), a random phone number with valid area code, address (with valid city, state, country and ZIP code but a random house number and street name), and a random IP address. Much like would be the case in the work of a skilled fraudster, the city, state, country, and ZIP and area code of each synthetic identity align with one another.
  • Stolen (Simulated) Data — To create this third data set, researchers randomized the real data from the first set — keeping it valid in and of itself, but associated it with different people. This was done to simulate a fraudster’s tactic of stealing most parts of an identity but changing components for misdirection (thus allowing them to have goods, funds or services delivered to themselves rather than to the legitimate consumer).

Step Three: Generate a series of social and online data for them and then test to see if the variables accurately classified each identity as real, fake or stolen.

Using each of the three data sets, Socure used its ID+ platform to build a predictive model.

According to Socure’s research, the predictive model showed a success rate of 98.8 percent.


To get the full insight into how the utilization of online and social data to supplement offline data can more effectively provide evidence of a consumer’s true digital identity (or lack thereof) that traditional methods, download the new white paper from Socure— “Real, Fake or Stolen: Validating the Use of Alternative Data for Identity Verification” — that examines the potential of utilizing online and social media data to provide a fuller digital identity picture of otherwise “thin file” under-documented population segments.

To download Real, Fake or Stolen: Validating the Use of Alternative Data for Identity Verification, fill out the form below.

Your First Name (required):

Your Last Name (required) :

Company (required):

Your Corporate Email (required):


Featured PYMNTS Study:

More than 63 percent of merchant service providers (MSPs) want to overhaul their core payment processing systems so they can up their value-added services (VAS) game. It’s tough, though, since many of these systems date back to the pre-digital era. In the January 2020 Optimizing Merchant Services Playbook, PYMNTS unpacks what 200 MSPs say is key to delivering the VAS agenda that is critical to their success.

Click to comment