Mapping The Black Box: A Look Inside Machine Learning’s Powers And Quirks

“Machine learning is the answer to the financial inclusion problem.”

So said Douglas Merrill, founder of ZestFinance. It’s a bold statement, considering that the lending space is still trying to articulate what, exactly, the financial inclusion problem is — and everyone else is still trying to figure out what, exactly, “machine learning” means and why it matters to them. Merrill said there are two distinctly wrong ways to look at these topics.

First, many in the lending space may wonder, “What financial inclusion problem?” To some, it may appear that the credit crisis is over — by the numbers, there’s more credit available now than there was in 2007 and 2008, just before the financial crisis hit.

But dig one layer deeper, said Merrill, and it becomes clear that the crisis is far from over. The total amount of credit available now exceeds 2008 levels but is less evenly distributed, squeezing out the bottom tenth percentile of borrowers — a population that once could have been considered adequately banked is now banked inadequately or not at all.

This is a problem Merrill said machine learning can help solve, which dovetails into the second misperception: Many see machine learning as a good unto itself, worth pursuing for its sheer novelty value when, in fact, it has its pros and cons just like any other technology.

Machine learning has its place, and Merrill does believe it could help lift those unbanked consumers into a bankable position, but he said the first step is creating healthy cynicism in those who wish to adopt this trendy tech.

In a recent webinar with Karen Webster, Merrill set out to educate corporates that are considering using machine learning for credit decisioning and underwriting, with the hope of transforming opaque black boxes into something the average Joe and Jane can understand and use to make decisions for their organization.

A Glimpse Inside the Black Box

To illustrate just how complex a machine learning model can be, Merrill gave the example of a model whose job is to predict whether someone is male or female based on input signals.

One of the first input signals he might feed the model could be height. Typically, men tend to be taller than women. However, that input signal alone won’t be enough to determine whether someone is a man or a woman. There are tall women — consider Sandy Allen, at 7 feet, 7 inches — and short men.

So, another input signal is needed. Merrill suggested weight as a second factor. Even compared to women of the same height, men tend to weight more, he said. But adding this input signal creates another issue, which is that children will now register as women due to their weight.

Now, said Merrill, let’s add age as an input signal. With this third signal, the model can control for the presence of children in the signal space. By eliminating children from the sample set, the model can now more accurately predict whether a person is male or female based on that person’s height, weight and age.

Most people wouldn’t think that age could be determinative of gender, but it turns out to be an important factor. Merrill said these subtle relationships between variables can be found in loan underwriting too — factors that seem unrelated can, together, be much more predictive than expected.


It’s irresponsible and unethical to power your lending business with an underwriting model you don’t understand and can’t explain, said Merrill. Developers of machine learning models for financial institutions (FIs) must therefore be able to identify and explain the variables their algorithms are taking into account and to what extent when making decisions, he continued.

That can be an issue for certain types of artificial intelligence (AI) techniques, which are so mathematically complex that they are referred to as “black boxes.” And regulators don’t like black boxes.

Relying on a black box may not matter much in a low-stakes situation, such as a microwave distinguishing that an input of “60” means 60 seconds while an input of “100” means one minute. But when it comes to high-stakes situations like credit decisioning and self-driving cars, regulators care tremendously how automated decisioning systems make judgments — and so should FIs and their customers: FIs because it affects the likelihood that they will be paid back for money lent out and customers because it affects the likelihood that they will be able to gain access to that credit.

Some AI techniques are easier to explain than others. For example, decision trees can show the entire order of operations behind a decision. A good decision tree may include over 100 signals, Merrill said, but even with so many factors, it’s still possible to know exactly what is driving results by simply “walking the tree.”

On the other hand, neural networks are mind-bogglingly complex — no human can discern their reasoning simply by looking at one. Neural networks give yes or no answers on whether to approve a loan applicant with no context or criteria to indicate how it reached the answer it chose.

Mapping “Fair”

Why is it so important to have a view into the black box? One reason is fair lending compliance. Merrill shared an odd machine learning quirk that could have gotten one company into hot water if someone hadn’t spotted it in time.

An auto lender had what it thought was two innocuous signals in its model. One signal was whether the car being purchased had high mileage. Another signal in the model was whether an applicant lived in a particular state — again, a seemingly benign signal. These two signals may appear to have nothing to do with each other. Yet for some reason, when mileage is combined with residence in that state, together those two signals proxied to a protected class — race. In other words, the model was yielding significantly lower approval rates for minority applicants, but without explainable artificial intelligence techniques, nobody could figure out why.

Just because the quirk was unintentional didn’t make its consequences any less real, said Merrill. The bias was unfair to both customers and the lender itself — and could have gotten the FI in deep trouble if tools to explain machine learning algorithms hadn’t spotted it first.

Where to Start

Start anywhere, said Merrill. Use the data you already have before adding more from external data sources. All data can be credit data, he said. For instance, many FIs find they can achieve higher loan approval rates and reduced defaults by including customer relationship data they already have on hand into their credit models.

FIs are still in the early days of using machine learning for automated underwriting, said Merrill. The first step is admitting that conventional credit appraisal methods overlook tens of millions of creditworthy borrowers. Then the question for FIs becomes: How do we take advantage of machine learning-based underwriting without throwing out all of our processes and existing technology infrastructure? Bringing risk, compliance and IT teams into the decision-making process is essential to socializing the idea of using machine learning as an underwriting tool in an organization.

It won’t be an easy transition, Merrill acknowledged, but then again, transitions never are.

“No matter how hard you try, there’s going to be some work involved in moving from today to tomorrow,” said Merrill, but “data is your superpower once you start using it.”