Evaluating Binary Classification Models Using Gains Tables

R
Credit Risk Analytics
Model Evaluation
Published

January 28, 2024

Gains tables (also known as KS tables) are a go-to tool for evaluating binary classification models in credit risk. In this post, let’s build one from scratch in R and unpack what each metric tells us. ## Theoretical Foundation: Understanding Gains Tables

A gains table systematically discretizes the population (typically a validation or test dataset) into groups based on the model’s output predictions (probability scores, log odds, or risk scores). Each group conventionally represents 10% of the total population (deciles), though alternative binning strategies may be employed. The output presents summary statistics for each group and analyzes the cumulative distributions of events (defaults) and non-events to quantify the model’s discriminatory performance.

Required Packages

# Load required packages
library(dplyr)
library(magrittr)
library(knitr)
library(scales)

Dataset Preparation

Let’s use a sample from the Lending Club dataset, which has loan characteristics, borrower details, and repayment outcomes — exactly what we need.

# Load the sample data
sample <- read.csv("https://bit.ly/42ypcnJ")

# Check dimensions
dim(sample)
[1] 10000   153

Target Definition

First, let’s create a binary target variable that flags borrowers who defaulted on their loans.

# Check unique loan statuses
unique(sample$loan_status)
[1] "Fully Paid"                                         
[2] "Current"                                            
[3] "Charged Off"                                        
[4] "Late (31-120 days)"                                 
[5] "Late (16-30 days)"                                  
[6] "In Grace Period"                                    
[7] "Does not meet the credit policy. Status:Fully Paid" 
[8] "Does not meet the credit policy. Status:Charged Off"
# Define "bad" loans as those that are charged off
codes <- c("Charged Off", "Does not meet the credit policy. Status:Charged Off")

# Create a binary flag for defaults
sample %<>% mutate(bad_flag = ifelse(loan_status %in% codes, 1, 0))

# Check overall event rates
sample %>% 
  summarise(events = sum(bad_flag == 1), 
            non_events = sum(bad_flag == 0)) %>% 
  mutate(            event_rate = events/(events + non_events))
  events non_events event_rate
1   1162       8838     0.1162

Model Development

Let’s fit a simple logistic regression model to generate predicted scores.

# Replace NA values with a default value
sample[is.na(sample)] <- -1

# Clean the data
sample %<>% 
  # Remove cases where home ownership and payment plan are not reported
  filter(!home_ownership %in% c("", "NONE"),
         pymnt_plan != "") %>% 
  # Convert categorical variables to factors
  mutate(home_ownership = factor(home_ownership), 
         pymnt_plan = factor(pymnt_plan))

# Train-test split (70-30)
idx <- sample(1:nrow(sample), size = 0.7 * nrow(sample), replace = FALSE)
train <- sample[idx,]
test <- sample[-idx,]
# Build a logistic regression model
mdl <- glm(
  formula = bad_flag ~ 
    loan_amnt + term + mths_since_last_delinq + total_pymnt + 
    home_ownership + acc_now_delinq + 
    inq_last_6mths + delinq_amnt + 
    mths_since_last_record + mths_since_recent_revol_delinq + 
    mths_since_last_major_derog + mths_since_recent_inq + 
    mths_since_recent_bc + num_accts_ever_120_pd,
  family = "binomial", 
  data = train
)

# Generate predictions on the test set
test$pred <- predict(mdl, newdata = test)

Gains Table Construction

Let’s now build the gains table step by step — binning the population and computing the key metrics in each group.

Population Discretization into Bins

# Create deciles based on model predictions
q <- quantile(test$pred, probs = seq(0, 1, length.out = 11))

# Add bins to test dataset
test$bins <- cut(test$pred, breaks = q, include.lowest = TRUE, 
                right = TRUE, ordered_result = TRUE)

# Check the bin levels (note they're in increasing order)
levels(test$bins)
 [1] "[-5.59,-3.36]" "(-3.36,-2.91]" "(-2.91,-2.62]" "(-2.62,-2.42]"
 [5] "(-2.42,-2.24]" "(-2.24,-2.07]" "(-2.07,-1.87]" "(-1.87,-1.61]"
 [9] "(-1.61,-1.22]" "(-1.22,1.39]" 

Basic Statistical Measures by Segment

# Create initial gains table with counts
gains_table <- test %>% 
  group_by(bins) %>% 
  summarise(total = n(), 
            events = sum(bad_flag == 1), 
            non_events = sum(bad_flag == 0))

# Add event rate column
gains_table %<>%
  mutate(event_rate = percent(events / total, 0.1, 100))

# Display the table
kable(gains_table)
bins total events non_events event_rate
[-5.59,-3.36] 300 4 296 1.3%
(-3.36,-2.91] 300 9 291 3.0%
(-2.91,-2.62] 300 16 284 5.3%
(-2.62,-2.42] 300 15 285 5.0%
(-2.42,-2.24] 300 27 273 9.0%
(-2.24,-2.07] 300 34 266 11.3%
(-2.07,-1.87] 300 53 247 17.7%
(-1.87,-1.61] 300 48 252 16.0%
(-1.61,-1.22] 300 60 240 20.0%
(-1.22,1.39] 300 82 218 27.3%

Cumulative Distribution

# Add population percentage and cumulative distributions
gains_table %<>%
  mutate(pop_pct = percent(total/sum(total), 0.1, 100), 
         
         # Calculate cumulative percentages
         c.events_pct = cumsum(events) / sum(events),
         c.non_events_pct = cumsum(non_events) / sum(non_events))

# Display the updated table
kable(gains_table)
bins total events non_events event_rate pop_pct c.events_pct c.non_events_pct
[-5.59,-3.36] 300 4 296 1.3% 10.0% 0.0114943 0.1116139
(-3.36,-2.91] 300 9 291 3.0% 10.0% 0.0373563 0.2213424
(-2.91,-2.62] 300 16 284 5.3% 10.0% 0.0833333 0.3284314
(-2.62,-2.42] 300 15 285 5.0% 10.0% 0.1264368 0.4358974
(-2.42,-2.24] 300 27 273 9.0% 10.0% 0.2040230 0.5388386
(-2.24,-2.07] 300 34 266 11.3% 10.0% 0.3017241 0.6391403
(-2.07,-1.87] 300 53 247 17.7% 10.0% 0.4540230 0.7322775
(-1.87,-1.61] 300 48 252 16.0% 10.0% 0.5919540 0.8273002
(-1.61,-1.22] 300 60 240 20.0% 10.0% 0.7643678 0.9177979
(-1.22,1.39] 300 82 218 27.3% 10.0% 1.0000000 1.0000000

Performance Metrics

# Add KS statistic, capture rate, and cumulative event rate
gains_table %<>%
  mutate(
    # KS statistic (difference between cumulative distributions)
    ks = round(abs(c.events_pct - c.non_events_pct), 2), 
    
    # Capture rate (percentage of total events captured)
    cap_rate = percent(cumsum(events)/sum(events), 1, 100), 
    
    # Cumulative event rate
    c_event_rate = percent(cumsum(events)/cumsum(total), 0.1, 100), 
    
    # Format percentage columns
    c.events_pct = percent(c.events_pct, 0.1, 100),
    c.non_events_pct = percent(c.non_events_pct, 0.1, 100))

# Display the final table
kable(gains_table)
bins total events non_events event_rate pop_pct c.events_pct c.non_events_pct ks cap_rate c_event_rate
[-5.59,-3.36] 300 4 296 1.3% 10.0% 1.1% 11.2% 0.10 1% 1.3%
(-3.36,-2.91] 300 9 291 3.0% 10.0% 3.7% 22.1% 0.18 4% 2.2%
(-2.91,-2.62] 300 16 284 5.3% 10.0% 8.3% 32.8% 0.25 8% 3.2%
(-2.62,-2.42] 300 15 285 5.0% 10.0% 12.6% 43.6% 0.31 13% 3.7%
(-2.42,-2.24] 300 27 273 9.0% 10.0% 20.4% 53.9% 0.33 20% 4.7%
(-2.24,-2.07] 300 34 266 11.3% 10.0% 30.2% 63.9% 0.34 30% 5.8%
(-2.07,-1.87] 300 53 247 17.7% 10.0% 45.4% 73.2% 0.28 45% 7.5%
(-1.87,-1.61] 300 48 252 16.0% 10.0% 59.2% 82.7% 0.24 59% 8.6%
(-1.61,-1.22] 300 60 240 20.0% 10.0% 76.4% 91.8% 0.15 76% 9.9%
(-1.22,1.39] 300 82 218 27.3% 10.0% 100.0% 100.0% 0.00 100% 11.6%

Reusable Function

The following implementation encapsulates the gains table construction process within a comprehensive, reusable function suitable for any binary classification model evaluation:

gains_table <- function(act, pred, increasing = TRUE, nBins = 10) {
  
  # Create bins based on predictions
  q <- quantile(pred, probs = seq(0, 1, length.out = nBins + 1))
  bins <- cut(pred, breaks = q, include.lowest = TRUE, right = TRUE, ordered_result = TRUE)
  
  df <- data.frame(act, pred, bins)
  
  df %>% 
    # Group by bins and calculate statistics
    group_by(bins) %>% 
    summarise(total = n(), 
              events = sum(act == 1), 
              non_events = sum(act == 0)) %>% 
    mutate(event_rate = percent(events / total, 0.1, 100)) %>% 
    
    # Sort the table based on the 'increasing' parameter
    {if(increasing == TRUE) {
      arrange(., bins)
    } else {
      arrange(., desc(bins))
    }} %>% 
    
    # Add all performance metrics
    mutate(pop_pct = percent(total/sum(total), 0.1, 100), 
           c.events_pct = cumsum(events) / sum(events),
           c.non_events_pct = cumsum(non_events) / sum(non_events), 
           ks = round(abs(c.events_pct - c.non_events_pct), 2), 
           cap_rate = percent(cumsum(events)/sum(events), 1, 100), 
           c_event_rate = percent(cumsum(events)/cumsum(total), 0.1, 100), 
           c.events_pct = percent(c.events_pct, 0.1, 100),
           c.non_events_pct = percent(c.non_events_pct, 0.1, 100))
}

Function Implementation

# Generate a gains table with bins in descending order
tab <- gains_table(test$bad_flag, test$pred, FALSE, 10)
kable(tab)
bins total events non_events event_rate pop_pct c.events_pct c.non_events_pct ks cap_rate c_event_rate
(-1.22,1.39] 300 82 218 27.3% 10.0% 23.6% 8.2% 0.15 24% 27.3%
(-1.61,-1.22] 300 60 240 20.0% 10.0% 40.8% 17.3% 0.24 41% 23.7%
(-1.87,-1.61] 300 48 252 16.0% 10.0% 54.6% 26.8% 0.28 55% 21.1%
(-2.07,-1.87] 300 53 247 17.7% 10.0% 69.8% 36.1% 0.34 70% 20.2%
(-2.24,-2.07] 300 34 266 11.3% 10.0% 79.6% 46.1% 0.33 80% 18.5%
(-2.42,-2.24] 300 27 273 9.0% 10.0% 87.4% 56.4% 0.31 87% 16.9%
(-2.62,-2.42] 300 15 285 5.0% 10.0% 91.7% 67.2% 0.25 92% 15.2%
(-2.91,-2.62] 300 16 284 5.3% 10.0% 96.3% 77.9% 0.18 96% 14.0%
(-3.36,-2.91] 300 9 291 3.0% 10.0% 98.9% 88.8% 0.10 99% 12.7%
[-5.59,-3.36] 300 4 296 1.3% 10.0% 100.0% 100.0% 0.00 100% 11.6%

Interpretation

A properly constructed gains table provides multiple critical insights into model performance characteristics:

  1. Monotonicity Assessment: Event rates should demonstrate consistent increases (or decreases) across bins, confirming the model’s effectiveness in rank-ordering risk levels.

  2. Population Distribution: Consistent bin sizes (ideally ~10% each) indicate appropriate score distribution. Inconsistent sizes suggest score clustering, which may complicate threshold determination.

  3. Kolmogorov-Smirnov (KS) Statistic: The maximum KS value represents the model’s discriminatory power. Higher values (approaching 1.0) indicate superior separation between positive and negative cases.

  4. Capture Rate: Demonstrates the percentage of total events captured at each threshold, essential for operational decision-making.

  5. Cumulative Event Rate: Indicates the event rate among all cases up to each bin, facilitating approval threshold establishment.

Applications in Credit Risk Analytics

Gains tables serve multiple critical functions in credit risk management environments:

  1. Threshold Optimization: Identification of appropriate score thresholds for automated approval or rejection decisions.

  2. Tiered Strategy Development: Construction of multi-tier decision strategies (approve, manual review, decline) based on quantified risk levels.

  3. Model Performance Monitoring: Longitudinal tracking of model performance through comparison of actual versus expected distributions.

  4. Comparative Model Evaluation: Systematic comparison of alternative models through KS statistics and capture rate analysis.