UK Financial Regulator Sees Synthetic Data as Next Step in Data Sharing  

The U.K. Financial Conduct Authority is calling on stakeholders to provide input about the use of synthetic data to support financial services innovation. In a public consultation launched on March 31, the regulator highlighted the benefits of this type of data and how it could help small firms to better compete with incumbents. 

“Synthetic data is a privacy-preserving technique that could open up more opportunities for data sharing by generating statistically realistic, but ‘artificial’ data, that is readily accessible,” reads the FCA’s report. 

Synthetic data is not ‘real’ data created naturally through real-world events — rather, it is ‘artificial’ data generated using algorithms. The benefit of using synthetic data is that it simulates real data without identifying specific individuals; therefore, as long as no real individuals can be identified from the synthetic data, data protection obligations such as GDPR do not apply. The FCA explains that synthetic data is created by observing patterns and the statistical properties of real data and using algorithms to replicate these patterns within the synthetic dataset, aiming to make it a realistic replica of the real data. While the utility and analytical value of the synthetic datasets are dependent on the quality of the model and data used to generate them, these ‘artificial’ datasets can be shared for a wide range of uses. 

The FCA also admits that synthetic data, while preserving the privacy of individuals, is not completely exempted from the risk of de-anonymization of data using reverse engineering techniques. However, this risk is lower than with anonymized data or other methods that protect individuals’ identities.  

For this reason, the U.K. regulator is seeking input from companies, academics, incumbents, startups, tech firms and other regulators about extending the use of synthetic data in financial services. This would allow small firms to access computer-generated data to train their algorithms and innovative products before testing them in the market, and without compromising anybody’s privacy: a sandbox pilot. Additionally, synthetic data could fill the gap when the data required is rare or doesn’t exist in sufficient quantitates for training purposes. 

“We would like to conduct an introductory exploration of market attitudes towards synthetic data, and its potential for opening data sharing between firms, regulators and other public bodies,” says the report.

The use of synthetic data would go beyond open banking, which provides access to individual data when consent is granted. According to the FCA, synthetic data would allow machine learning and artificial intelligence technology to develop by accessing large datasets, and it would be available for new entrants who otherwise would face a significant barrier to entry to amass this data. The data available would also be shared between regulators and the private sector or among public institutions. 

The regulator is also testing how far it can go to control access to this type of data. The FCA is proposing three roles, not mutually exclusive, that the regulator could adopt in this new strategy: data generator, central host and coordinator.  

In the first role, the regulator would collaborate with the industry and academia to generate synthetic data in-house, to be shared with the industry. As central host, the regulator would provide an independent hosting platform through which synthetic data can be stored, shared and accessed for product development and testing. Finally, as coordinator, the regulator would facilitate data sharing and collaboration opportunities for synthetic data generation. 

The FCA will accept comments on this proposal until June 22, 2022. 

Read Also: UK to Boost Open Banking With New Regulator