Evaluating Binary Classification Models Using Gains Tables

R
Credit Risk Analytics
Model Evaluation
Published

January 28, 2024

Introduction

In credit risk modeling and binary classification applications, analysts employ gains tables (also known as KS tables) as a fundamental tool for measuring and quantifying model performance. This tutorial dives into the construction and interpretion of gains tables using R. ## Theoretical Foundation: Understanding Gains Tables

A gains table systematically discretizes the population (typically a validation or test dataset) into groups based on the model’s output predictions (probability scores, log odds, or risk scores). Each group conventionally represents 10% of the total population (deciles), though alternative binning strategies may be employed. The output presents summary statistics for each group and analyzes the cumulative distributions of events (defaults) and non-events to quantify the model’s discriminatory performance.

Package Dependencies

# Load required packages
library(dplyr)
library(magrittr)
library(knitr)
library(scales)

Dataset Preparation

This tutorial utilizes a sample from the Lending Club dataset, which contains comprehensive loan information and associated outcomes suitable for credit risk modeling applications.

# Load the sample data
sample <- read.csv("https://bit.ly/42ypcnJ")

# Check dimensions
dim(sample)
[1] 10000   153

Target Definition

The initial step requires the creation of a binary target variable for modeling purposes. In this credit risk application, we identify borrowers who defaulted on their loan obligations.

# Check unique loan statuses
unique(sample$loan_status)
[1] "Fully Paid"                                         
[2] "Current"                                            
[3] "Charged Off"                                        
[4] "Late (31-120 days)"                                 
[5] "Late (16-30 days)"                                  
[6] "In Grace Period"                                    
[7] "Does not meet the credit policy. Status:Fully Paid" 
[8] "Does not meet the credit policy. Status:Charged Off"
# Define "bad" loans as those that are charged off
codes <- c("Charged Off", "Does not meet the credit policy. Status:Charged Off")

# Create a binary flag for defaults
sample %<>% mutate(bad_flag = ifelse(loan_status %in% codes, 1, 0))

# Check overall event rates
sample %>% 
  summarise(events = sum(bad_flag == 1), 
            non_events = sum(bad_flag == 0)) %>% 
  mutate(            event_rate = events/(events + non_events))
  events non_events event_rate
1   1162       8838     0.1162

Model Development

Subsequently, we develop a logistic regression model to generate predictions that will serve as the foundation for gains table construction.

# Replace NA values with a default value
sample[is.na(sample)] <- -1

# Clean the data
sample %<>% 
  # Remove cases where home ownership and payment plan are not reported
  filter(!home_ownership %in% c("", "NONE"),
         pymnt_plan != "") %>% 
  # Convert categorical variables to factors
  mutate(home_ownership = factor(home_ownership), 
         pymnt_plan = factor(pymnt_plan))

# Train-test split (70-30)
idx <- sample(1:nrow(sample), size = 0.7 * nrow(sample), replace = FALSE)
train <- sample[idx,]
test <- sample[-idx,]
# Build a logistic regression model
mdl <- glm(
  formula = bad_flag ~ 
    loan_amnt + term + mths_since_last_delinq + total_pymnt + 
    home_ownership + acc_now_delinq + 
    inq_last_6mths + delinq_amnt + 
    mths_since_last_record + mths_since_recent_revol_delinq + 
    mths_since_last_major_derog + mths_since_recent_inq + 
    mths_since_recent_bc + num_accts_ever_120_pd,
  family = "binomial", 
  data = train
)

# Generate predictions on the test set
test$pred <- predict(mdl, newdata = test)

Gains Table Construction

The following section demonstrates the step-by-step construction of a comprehensive gains table through systematic binning and statistical analysis.

Population Discretization into Bins

# Create deciles based on model predictions
q <- quantile(test$pred, probs = seq(0, 1, length.out = 11))

# Add bins to test dataset
test$bins <- cut(test$pred, breaks = q, include.lowest = TRUE, 
                right = TRUE, ordered_result = TRUE)

# Check the bin levels (note they're in increasing order)
levels(test$bins)
 [1] "[-5.11,-3.34]" "(-3.34,-2.89]" "(-2.89,-2.64]" "(-2.64,-2.41]"
 [5] "(-2.41,-2.23]" "(-2.23,-2.03]" "(-2.03,-1.83]" "(-1.83,-1.6]" 
 [9] "(-1.6,-1.26]"  "(-1.26,2.51]" 

Basic Statistical Measures by Segment

# Create initial gains table with counts
gains_table <- test %>% 
  group_by(bins) %>% 
  summarise(total = n(), 
            events = sum(bad_flag == 1), 
            non_events = sum(bad_flag == 0))

# Add event rate column
gains_table %<>%
  mutate(event_rate = percent(events / total, 0.1, 100))

# Display the table
kable(gains_table)
bins total events non_events event_rate
[-5.11,-3.34] 300 3 297 1.0%
(-3.34,-2.89] 300 8 292 2.7%
(-2.89,-2.64] 300 10 290 3.3%
(-2.64,-2.41] 300 14 286 4.7%
(-2.41,-2.23] 300 26 274 8.7%
(-2.23,-2.03] 300 38 262 12.7%
(-2.03,-1.83] 300 46 254 15.3%
(-1.83,-1.6] 300 49 251 16.3%
(-1.6,-1.26] 300 58 242 19.3%
(-1.26,2.51] 300 74 226 24.7%

Cumulative Distribution

# Add population percentage and cumulative distributions
gains_table %<>%
  mutate(pop_pct = percent(total/sum(total), 0.1, 100), 
         
         # Calculate cumulative percentages
         c.events_pct = cumsum(events) / sum(events),
         c.non_events_pct = cumsum(non_events) / sum(non_events))

# Display the updated table
kable(gains_table)
bins total events non_events event_rate pop_pct c.events_pct c.non_events_pct
[-5.11,-3.34] 300 3 297 1.0% 10.0% 0.0092025 0.1110696
(-3.34,-2.89] 300 8 292 2.7% 10.0% 0.0337423 0.2202693
(-2.89,-2.64] 300 10 290 3.3% 10.0% 0.0644172 0.3287210
(-2.64,-2.41] 300 14 286 4.7% 10.0% 0.1073620 0.4356769
(-2.41,-2.23] 300 26 274 8.7% 10.0% 0.1871166 0.5381451
(-2.23,-2.03] 300 38 262 12.7% 10.0% 0.3036810 0.6361257
(-2.03,-1.83] 300 46 254 15.3% 10.0% 0.4447853 0.7311144
(-1.83,-1.6] 300 49 251 16.3% 10.0% 0.5950920 0.8249813
(-1.6,-1.26] 300 58 242 19.3% 10.0% 0.7730061 0.9154824
(-1.26,2.51] 300 74 226 24.7% 10.0% 1.0000000 1.0000000

Performance Metrics

# Add KS statistic, capture rate, and cumulative event rate
gains_table %<>%
  mutate(
    # KS statistic (difference between cumulative distributions)
    ks = round(abs(c.events_pct - c.non_events_pct), 2), 
    
    # Capture rate (percentage of total events captured)
    cap_rate = percent(cumsum(events)/sum(events), 1, 100), 
    
    # Cumulative event rate
    c_event_rate = percent(cumsum(events)/cumsum(total), 0.1, 100), 
    
    # Format percentage columns
    c.events_pct = percent(c.events_pct, 0.1, 100),
    c.non_events_pct = percent(c.non_events_pct, 0.1, 100))

# Display the final table
kable(gains_table)
bins total events non_events event_rate pop_pct c.events_pct c.non_events_pct ks cap_rate c_event_rate
[-5.11,-3.34] 300 3 297 1.0% 10.0% 0.9% 11.1% 0.10 1% 1.0%
(-3.34,-2.89] 300 8 292 2.7% 10.0% 3.4% 22.0% 0.19 3% 1.8%
(-2.89,-2.64] 300 10 290 3.3% 10.0% 6.4% 32.9% 0.26 6% 2.3%
(-2.64,-2.41] 300 14 286 4.7% 10.0% 10.7% 43.6% 0.33 11% 2.9%
(-2.41,-2.23] 300 26 274 8.7% 10.0% 18.7% 53.8% 0.35 19% 4.1%
(-2.23,-2.03] 300 38 262 12.7% 10.0% 30.4% 63.6% 0.33 30% 5.5%
(-2.03,-1.83] 300 46 254 15.3% 10.0% 44.5% 73.1% 0.29 44% 6.9%
(-1.83,-1.6] 300 49 251 16.3% 10.0% 59.5% 82.5% 0.23 60% 8.1%
(-1.6,-1.26] 300 58 242 19.3% 10.0% 77.3% 91.5% 0.14 77% 9.3%
(-1.26,2.51] 300 74 226 24.7% 10.0% 100.0% 100.0% 0.00 100% 10.9%

Reusable Function

The following implementation encapsulates the gains table construction process within a comprehensive, reusable function suitable for any binary classification model evaluation:

gains_table <- function(act, pred, increasing = TRUE, nBins = 10) {
  
  # Create bins based on predictions
  q <- quantile(pred, probs = seq(0, 1, length.out = nBins + 1))
  bins <- cut(pred, breaks = q, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)
  
  df <- data.frame(act, pred, bins)
  
  df %>% 
    # Group by bins and calculate statistics
    group_by(bins) %>% 
    summarise(total = n(), 
              events = sum(act == 1), 
              non_events = sum(act == 0)) %>% 
    mutate(event_rate = percent(events / total, 0.1, 100)) %>% 
    
    # Sort the table based on the 'increasing' parameter
    {if(increasing == TRUE) {
      arrange(., bins)
    } else {
      arrange(., desc(bins))
    }} %>% 
    
    # Add all performance metrics
    mutate(pop_pct = percent(total/sum(total), 0.1, 100), 
           c.events_pct = cumsum(events) / sum(events),
           c.non_events_pct = cumsum(non_events) / sum(non_events), 
           ks = round(abs(c.events_pct - c.non_events_pct), 2), 
           cap_rate = percent(cumsum(events)/sum(events), 1, 100), 
           c_event_rate = percent(cumsum(events)/cumsum(total), 0.1, 100), 
           c.events_pct = percent(c.events_pct, 0.1, 100),
           c.non_events_pct = percent(c.non_events_pct, 0.1, 100))
}

Function Implementation

# Generate a gains table with bins in descending order
tab <- gains_table(test$bad_flag, test$pred, FALSE, 10)
kable(tab)
bins total events non_events event_rate pop_pct c.events_pct c.non_events_pct ks cap_rate c_event_rate
(-1.26,2.51] 300 74 226 24.7% 10.0% 22.7% 8.5% 0.14 23% 24.7%
(-1.6,-1.26] 300 58 242 19.3% 10.0% 40.5% 17.5% 0.23 40% 22.0%
(-1.83,-1.6] 300 49 251 16.3% 10.0% 55.5% 26.9% 0.29 56% 20.1%
(-2.03,-1.83] 300 46 254 15.3% 10.0% 69.6% 36.4% 0.33 70% 18.9%
(-2.23,-2.03] 300 38 262 12.7% 10.0% 81.3% 46.2% 0.35 81% 17.7%
(-2.41,-2.23] 300 26 274 8.7% 10.0% 89.3% 56.4% 0.33 89% 16.2%
(-2.64,-2.41] 300 14 286 4.7% 10.0% 93.6% 67.1% 0.26 94% 14.5%
(-2.89,-2.64] 300 10 290 3.3% 10.0% 96.6% 78.0% 0.19 97% 13.1%
(-3.34,-2.89] 300 8 292 2.7% 10.0% 99.1% 88.9% 0.10 99% 12.0%
[-5.11,-3.34] 300 3 297 1.0% 10.0% 100.0% 100.0% 0.00 100% 10.9%

Interpretation

A properly constructed gains table provides multiple critical insights into model performance characteristics:

  1. Monotonicity Assessment: Event rates should demonstrate consistent increases (or decreases) across bins, confirming the model’s effectiveness in rank-ordering risk levels.

  2. Population Distribution: Consistent bin sizes (ideally ~10% each) indicate appropriate score distribution. Inconsistent sizes suggest score clustering, which may complicate threshold determination.

  3. Kolmogorov-Smirnov (KS) Statistic: The maximum KS value represents the model’s discriminatory power. Higher values (approaching 1.0) indicate superior separation between positive and negative cases.

  4. Capture Rate: Demonstrates the percentage of total events captured at each threshold, essential for operational decision-making.

  5. Cumulative Event Rate: Indicates the event rate among all cases up to each bin, facilitating approval threshold establishment.

Applications in Credit Risk Analytics

Gains tables serve multiple critical functions in credit risk management environments:

  1. Threshold Optimization: Identification of appropriate score thresholds for automated approval or rejection decisions.

  2. Tiered Strategy Development: Construction of multi-tier decision strategies (approve, manual review, decline) based on quantified risk levels.

  3. Model Performance Monitoring: Longitudinal tracking of model performance through comparison of actual versus expected distributions.

  4. Comparative Model Evaluation: Systematic comparison of alternative models through KS statistics and capture rate analysis.