Security & Fraud

Cracking FIs' 90/10 Data Monetization Problem

Using Synthetic Data In Financial Services

When seven CEOs of the largest banks in the United States gathered on Capitol Hill earlier this month, the topics and questions lobbed at them from members of the House Financial Services Committee spanned all manner of subjects.

The focus of the hearing was, generally speaking, the evolution of financial services in the wake of the financial crisis, and of course the spotlight shone on the myriad ways that things have changed over the past decade. FinTech, cryptos, pay gaps, capital requirements and even the ethics of doing business with firearms firms were all under discussion.

And, of course, data security – or perhaps the lack thereof – was fair game. The executives who testified said they have been focusing on data security, and will continue to do so, as huge breaches have dominated the headlines. Said Bank of America CEO Brian Moynihan, “We are in, effectively, a war on cybersecurity.” Along with JPMorgan CEO Jamie Dimon, he discussed how myriad regulations and bodies governing data and its use can be counterproductive in forging an effective strategy against the bad guys.

Might there be a solution in a compliance and data security approach that can keep consumers’ data safe, even as banks monetize that very data?

The endeavor is a win-win, allowing for mutually beneficial relationships between financial institutions (FIs) and their customers – and to get there, according to Randy Koch, CEO of ARM Insight, embracing synthetic data can make all the difference.

Against the backdrop of the congressional hearings last week, Koch told PYMNTS’ Karen Webster that adopting a synthetic data monetization model may have helped the bank CEOs put the Hill at ease.

A Whiteboard, a Marker and Two Boxes

“If I had been one of the CEOs, I would have gotten up with a marker and a big whiteboard and drawn two boxes,” he told Webster. Detailing the two boxes, he assumed the hypothetical role of a banking CEO armed with a bucket approach to segmenting data activities. In one box, Koch said, which holds roughly 10 percent of data management initiatives taken on by the bank, “here's PII (personal identifying information).

“We agree it has a risk. I will do everything in the world to make sure it's secure,’” he said of the banker’s pledge to consumers. “There’s only a limited amount of data.” Of the other 90 percent of data activities and transactions, Koch noted, the bank might reassure customers, “Don’t worry, we are leveraging anonymous and/or synthetic data. There is no consumer information in there.”

A focus on that 10 percent box, which is where the sensitive data resides – along with its “raw” data that shows where and how someone shops at any given time – is also where a targeted discussion of compliance and regulation can take place. It can also engender efficient deployment of security, resources and time.

That efficiency comes along with a segmented approach to data itself, which spans three categories. As has been spotlighted in this space – and deserves a recap – the aforementioned raw data is among the most valuable types of data, which is most often targeted for theft by the bad guys. That’s because it includes credit card data, addresses, names and perhaps even Social Security numbers.

“That’s the 10 percent that everybody is worried about,” Koch noted.

Anonymized data has had the personal information removed, is a bit more general and can be thought of as being descriptive of a transaction, such as “a customer bought a cup of coffee at a major coffee chain.”

Then there’s synthetic data, which can be thought of as data sets that are sifted through all sorts of mechanisms such as machine learning (ML) or artificial intelligence (AI) – and, as the name implies, has been created from whole cloth (i.e., manufactured), and thus cannot be traced back to consumers, the places where they do business or even the FI tied to the transaction.

But then again, the data that is synthesized is relevant and robust enough for FIs to monetize it, and to get a sense of where new opportunities might lie to strengthen or even forge new relationships with consumers.

(Those “targeted” mailings and emails that look to offer new financial products that could conceivably be recalibrated to be a lot more relevant to, say, a millennial dad with a 14-year-old son who spends a lot of money on sporting goods – and perhaps also needs to be thinking about saving for college.)

Banks, of course, are sitting on mountains of data, and need to think about how they can monetize that information. And with the segmented approach, Koch said, 90 percent of the data that courses through a bank’s internal systems becomes, as he put it, “non-scary … if you do the synthetic model correctly, it mimics the original, raw transaction data.”

The Rewards

Yet, he said, all too often, CEOs do not understand the benefits that accrue from a segmented data approach.

The first reward, Koch noted, is that “you can have tons of your [bank’s] team looking at anonymized and synthetic data,” instead of having, say, dozens of people doing research on PII. He said there is “no value degradation” as banks or other third-party businesses transition away from raw data and toward creating and mining synthetic data for insights. Also, he added, the embrace of synthetic data means that “90 percent of regulations do not apply,” as they are focused on personal information, which is now effectively off the table – and, for example, satisfies GDPR.

As Koch pointed out, the banks – which, of course, have been hammered in recent years amid negative headlines and any number of data breaches – “could answer questions about consumer data security and privacy by saying, ‘well, actually, we don't use consumer data in any form where there's personally identifiable information available to anyone, even inside of our own organization.”

He said the executives at the Hill hearing could have pointed to a process that completely strips that information from transactional data, which banks need for all different kinds of reasons, like ensuring transactions are safe, helping to fight cyber fraud and even getting the right targeted information to consumers when they are asking questions about mortgages, for example.

Why All Platforms Should Use It – and Where It Is Going

All platforms that handle customer data should use the synthetic data approach, Koch said – and a fast follower (beyond financial services) lies in healthcare. He termed it as a “sister industry” where there is a lot of regulation along with privacy and data concerns.

“At the same time, you want to give the latest statistics on a new MRI machine, and you do not need raw data for that,” he noted. “You could do synthetic data for that and have a much bigger data set,” he added, pointing out that in banking and beyond, the multi-faceted approach is “an opportunity to be helpful to the consumer even while monetizing the data.”



Banks, corporates and even regulators now recognize the imperative to modernize — not just digitize —the infrastructures and workflows that move money and data between businesses domestically and cross-border.

Together with Visa, PYMNTS invites you to a month-long series of livestreamed programs on these issues as they reshape B2B payments. Masters of modernization share insights and answer questions during a mix of intimate fireside chats and vibrant virtual roundtables.