Introducing SCRE: A Credit Rating Estimation Service Built on A&M’s Machine Learning Platform
Corporate Credit Rating Predictions are Important and Often Flawed
The ability to accurately predict corporate credit ratings is essential to numerous business functions for both borrowers and lenders. When predicting debt ratings, analysts often apply linear models, financial ratios analyses or a combination of the two. These models are usually inadequate across several measures.
Linear models are often pulled from out of date papers, can require unreasonable amounts of data (e.g. five years of historical financial statements) and generally do not exhibit a reliable level of accuracy. While financial ratio analysis can help to show a company’s credit position, the ratios are limited in their usefulness to predict ratings. For instance, a 2.9x debt/EBITDA ratio falls in the middle 50 percent of companies for every rating from A to B+. This is a common, and thoroughly underestimated, issue in financial ratio analyses.
Building a Better Model
We developed our Sample Credit Rating Estimator (SCRE) because we wanted to know if a model based on ML techniques could improve on existing ways of estimating credit ratings. To answer this question, we developed a database with over 100,000 quarterly observations of publicly-listed, S&P-rated, non-financial companies from 1997 to 2017. For each observation, we assigned over 400 features such as financial measures and descriptive information about the company. We then trained and tested 23 different ML estimators on the database.
For each train/test cycle, the data was partitioned to prevent observations of distinct companies from appearing in both the training and testing samples. For example, we ensured that all observations of General Motors were either in the training or testing set but not spread across both. In early stages, we found that our ML estimators were using overlap between the training and testing dataset to “identify” companies when making predictions. This led to artificially inflated results.
To maximize the efficiency of the model, we adopted an iterative approach to development that compared prediction to actual ratings after each train/test cycle to examine the possibility for additional features or alterations to the database. This process allowed us to identify numerous areas for improvement in the data. For instance, we noted that models trained exclusively on pre-2009 data or post-2009 data were more accurate than both combined. This indicated that the S&P ratings process changed in 2009 in response to the global financial crisis. As a result, we removed all pre-2009 observations from our database which increased the accuracy of our predictions.
After refining the database, we selected a “Random Forest Regressor” as the basis for our model. Random Forest estimators rely on a combination of multiple decision tree estimators that use random subsets of training data. In our model, each individual tree predicts a credit rating which is represented by a number (e.g., one for AAA, two for AA+), and the individual predictions are then averaged to determine a final predicted rating.
Managing, selecting and transforming the data was an important step to developing meaningful results. Once the data was curated and the algorithms considered all the data, we then performed a feature selection and optimization analysis to reduce the data capture requirements associated with the analysis.
Feature selection allowed us to reduce our data capture requirements to two years of high level historical financials[1] and the company’s sector. Feature selection allowed our estimator to maintain a higher prediction accuracy with less data than the commonly utilized linear models.
After feature selection, 92 percent of predicted ratings were within two notches of actual in our testing data set.
The Benefits of SCRE
A&M’s SCRE service and technologies possess several advantages beyond traditional corporate credit rating prediction models and techniques:
- Increased accuracy: Moody’s published a model in 2006 which used five years detailed historical financial and claimed 89 percent accuracy when predicting within two notches. Our model is more accurate by 3 percent and only uses two years historical financials. When predicting full ratings (i.e., ratings without +/- categorizations) our model was 5 percent better than observed linear models at predicting the exact full rating and 2 percent better (99 percent versus 97 percent) at correctly predicting within one full rating.
- Decreased data requirements: SCRE uses significantly less data than other credit rating models. For instance, the Moody’s model noted above requires five years of historical financial data. Altmans Z-score, Minardi and other models often require market value, which does not allow for private company estimates. The data requirements of SCRE are available to both public and private companies and most companies have at least two years of high level financials.
- Ability to perform portfolio-level analysis: SCRE will allow its users to examine credit ratings across entire portfolios for risk analysis or portfolio monitoring. Given the nature of the technology these ratings can be delivered in bulk without the need for significant individual analysis.
- Continuous updating over time: Many published models or analyses are developed at a specific point in time but never updated. SCRE will be continuously updated (i.e., retrain/test/feature selection) with updated data so the model itself will not become stale over time.
In short, SCRE produces more accurate ratings, more easily, at a greater scale with lower risk of the analysis becoming outdated as time goes on. This represents a fundamental improvement over traditional credit rating prediction models and techniques and one that will benefit a wide range of businesses and business functions.
[1] High level financials include: Revenue, EBITDA, EBIT, Interest, Net Income, Assets, Total Debt, Total Liabilities and Cash Flow.