Deep Dive: How Unsupervised Machine Learning Could Close The Cloud Services Security Gap


Cloud services like AWS, Dropbox and Google Cloud store terabytes of users’ personal information and payment data, making them prime targets for fraudsters looking to wreak havoc.

A particularly harrowing cloud services-based fraud incident occurred last year, when a disgruntled AWS employee gained access to more than 100 million Capital One customers’ details. The hacker exploited a misconfigured security firewall to breach the system and covered her tracks so well that Capital One did not notice the issue for three months.

The damage to both the bank — experts estimate the hack cost it more than $100 million — and its customers’ data privacy cannot be overstated, but what made this attack particularly worrisome was that Capital One’s and AWS’ cloud security measures were considered to be fairly robust. The cloud security space needs to completely rethink its options if such strong systems can still be breached. Unsupervised machine learning (ML) could be key to the space’s security overhaul, and many providers are already enjoying its advantages.

Traditional Cloud Security Measures Are Not up to Snuff 

Cybersecurity systems that served organizations well in the past are becoming obsolete as cloud technology grows. Many use perimeter defenses, devoting all their efforts to blocking points hackers could use to gain access. The problem with such systems is that there are no further protection layers if fraudsters find even one access point — as the Capital One hacker did. They can run unfettered within that cybersecurity perimeter once they gain entry.

The cloud’s openness of the cloud makes it impossible for defense teams to build totally unbreachable perimeters. A better method would involve a more defense-in-depth approach, which uses multiple security checks and a system to monitor users’ interactions while they are within the cloud environment to determine if their activities are legitimate.

The problem with this approach is the logistical challenge of constant monitoring. Physical security at a bank is a fitting analogy. It is a much greater undertaking to hire a group of security guards than it is to simply put a lock on the front door, but the latter would be considerably less secure. Banks are often willing to undertake added expenses for the sake of security, and cloud services providers need to follow their lead. Unsupervised ML could well serve this purpose and offer the necessary protections.

A Next-Gen Security Solution to a Next-Gen Problem

ML is largely present in one of two forms: supervised and unsupervised. Supervised ML systems require predetermined parameters. A supervised ML system could be given a digital fraud profile and search a database to find transactions that match it, for example. Unsupervised ML does not require set outcomes and relies on its own rules to detect patterns and anomalies, making it better equipped to comb through cloud environments’ much larger data sets. It is also capable of finding new and innovative fraud types that have not yet been encountered.

AWS and Google Cloud are both leaders in unsupervised ML and use it to make three types of predictions. The simplest of these is binary, which deal with variables that have yes or no answers. Binary solutions can examine individual cloud interaction aspects and flag them as fraudulent or legitimate, but they lack nuanced answers about overall fraud probabilities.

A more advanced solution is category prediction, in which ML engines look at entire data sets and categorize them based on their own variables. Fraud detection solutions can leverage category prediction based on past knowledge to determine if cloud interactions are illegitimate on a large scale.

The final and most complex type is value predictions, which make quantitative predictions about likely outcomes based on the given data. A value prediction ML system can look holistically at cloud interactions and return its best estimation about fraud’s likelihood.

DataVisor is another leader in unsupervised ML for fraud detection. It recently integrated its ML-powered fraud solution into Microsoft’s Azure AI, allowing the latter’s users to augment Azure’s tool set with DataVisor’s unsupervised ML offering. The partnership enables ML expansion to users who would otherwise not have access to it, and Datavisor’s CEO Yinglian Xie noted that more than 4 billion users currently leverage some form of its unsupervised ML engine.

Cloud services providers looking to beef up their security measures would do well to look to AWS, DataVisor and Google Cloud as examples of how to implement unsupervised ML. Becoming the victim of the next Capital One-style data breach could be the unenviable — but inevitable — outcome if such steps are not taken.