Credit Decisioning: Past, Present, and Future
The Old Days — A Personal Anecdote
I still remember the good old days before the term “data scientist” became popular. In those days, quantitatively-inclined professionals were still called Data Nerds! I was a proud data nerd, and my data skills helped me land my first job after college as a consultant for a mid-sized bank in the midwestern United States.
My first consultant task was pretty straightforward. The bank had a large amount of data on loans that it had distributed over the past decade. Our team of data nerds was tasked with creating an algorithm that could outperform the loan agents that reviewed loan applications and made manual decisions. The loan agents used a combination of manual analyses, heuristics, and gut feeling to reach their decisions. The algorithms were based only on data. Now-a-days this paradigm is already cliched. Machine learning powered by Big Data versus a Human. Can you guess who was the winner? The algorithm won hands down, and within a year, most of the loan agents had been fired.
What’s Not Changing in Credit Decisioning and What is Changing
The problem statement in credit decisioning is never going to change. Whether we’re using fancy algorithms or picking candidates randomly with the best smile, the basic question remains the same, will this applicant pay us back? Most of the advances over the past fifty years in credit decisioning can be broken down into three categories (A) Algorithms, (B) Data Quantity, and (C) Alternative Data.
At the risk of bursting a lot of people’s bubble, I think it’s worth stating very openly that better algorithms are not really the biggest driver of improved decisioning performance. This is true in credit decisioning and it’s true in basically every area in data science. As Peter Norvig, Director of Research at Google famously wrote:
“Simple models and a lot of data trump more elaborate models based on less data.”
Today, even a beginner data science student can see that when he compares the performance of an advanced neural network and a simple logistic regressions, the results are generally not very different. On the other hand, when he trains his model on 1,000,000 records versus 1,000 records, the performance differences are much much larger.
Twenty years ago, the world of credit decisioning was completely dominated by Logistic Regression. Eventually, some researchers started experimenting with more advanced techniques like Random Forest, Neural Networks, and Bayesian models. In my own research, I have seen that the more advanced techniques do improve performance, but I can understand why many large financial institutions still use simpler techniques like logistic regression. They don’t need to change. Banks already have an unfair advantage. They have more data and can produce better models with more explainability by simply applying simple algorithms to their very very large data sets.
Data quantity can refer to the amount of records (depth) or the amount variables (breadth) on a data set. Many FinTech lenders like to boast about the “20 million data points” in their algorithm. 20 million data points are not particularly helpful if the data scientist only has 1,000 records to train his model. So why do FinTechs think they have any chance competing against banks that have much much larger data sets? If data quantity drives performance, then we should conclude that whichever financial institution has the most data will win the credit decisioning war forever.
I don’t believe there will ever be a “Google Monopoly” in credit decisioning. In the last twenty years, we have seen a general trend towards more data openness in banking which helps create a level playing field where banks of all sizes and FinTechs can compete fairly. I think it’s worth highlighting a few innovations which made more data available for data scientists focused on credit decisioning.
Lending Club Phenomenon — Lending Club was one of the first famous peer-to-peer lenders. In 2007, when Lending Club started scaling operations, they made their loan performance data public. This was a gold mine for data scientists. I remember exploring this data in 2012 and building some very strong credit models. Lending Club eventually closed it’s peer-to-peer business and the historical data is no longer available on their web site (but you can still find it on the web on Kaggle). Lending Club opened up a world where anyone can access large amounts of loan performance data and many other peer-to-peer lenders have followed suit including Prosper, Bondora, and even data on disaster loans from the SBA.
Plaid Phenomenon — Plaid is a great company. I was a very early adopter of Plaid back in 2013 when Zach Perret and William Hockey were still launching their product. Plaid connects to banking APIs and helps users share their banking data with other companies like FinTechs, data aggregators, etc. Plaid helps level the playing field because it empowers users to share their banking data. This means that a small private lender can see as much about a user’s banking history as the largest bank in the country. Plaid imitators have been popping up throughout the world. In Brazil, we have been early adopters of Belvo and have been extremely impressed with their technology so far.
Open Banking / Invoicing — Open Banking is a system for banks to share user banking data in a standard format generally mandated by the government. The approach and level of adoption of Open Banking differs dramatically across the globe. Brazil is very advanced in its adoption of Open Banking. The largest banks have already embraced a common, government-mandated standard for sharing user data. Brazil is also the world leader in Electronic Invoicing and has the largest database of electronic invoices in the world. Open Banking and Electronic Invoicing are helping to level the data playing field in Brazil and stimulate the extreme growth of FinTechs in the country.
Traditional credit decisioning has always relied heavily on the credit bureau report. For the majority of the population, the credit bureau data is generally sufficient, but there are still large segments of the population that don’t have thorough data on the credit bureaus. In personal credit, young people and immigrants often get overlooked. In business credit, new businesses or businesses that historically did not rely on credit, often have trouble proving their credit worthiness. There has been a growing trend to identify increasingly creative approaches to find new data sources that can help determine an applicant’s creditworthiness.
Positive Bureau — One common critique of traditional bureaus is that they focus too narrowly on data reported by lending institutions. Positive credit reporting has emerged as a trend to incorporate alternative payment data from other sources like utility companies. In the U.S., Connect (formally PRBC), successfully produced as an alternative credit score based on payment data from utility companies. In Brazil, we have been impressed by Quod’s ability to leverage alternative credit data to produce innovative credit assessments.
Surveys — Many companies and academics over the past few decades have made broad claims about the use of survey data to create psychographic assessments of prospective borrowers. VisualDNA is a company that has surveyed millions of users to create unique profiles, but their research does not really focus specifically on credit assessment. Innovative Assessments is a more recent player that seems to be having success in supplementing traditional credit scoring with a survey-based assessment of character traits.
Games — Numerous academics have studied the ability to assess a person’s financial conscientiousness through their performance on interactive games. ConfirmU is on a path to commercialize some of this academic research and add another dimension to the traditional credit risk decision. The technology is still in its infancy, but seems to show promise as a valuable alternative data input into traditional credit models.
We’re moving into an age where lending applicants will be able to share ALL of their credit-related data instantly with any lender. I think the scope of applicant information will continue to expand beyond traditional metrics towards more behavior-based methods which will enable lenders to make better decisions and approve more applicants. As more financial data moves to the blockchain, the reliability and availability of this data will increase even more. These trends will accelerate across the entire globe expanding the availability of credit and improving the global economy. The future is bright. I’m excited to be a part of making this a reality!
If you enjoyed this, trying Following me on Medium.