Entry 28: Cumulative gains and lift

5 minute read

Remember back in Entry 16 when I said I wasn’t planning to cover lift? Well, plans change.

The notebooks where I did my code for this entry can be found on my github page:

The Problem

Lift charts were in Applied Predictive Modeling, so I wanted to include them in my review of metrics. As I was researching lift charts, cumulative gains charts and the Kolomogorov-Smirnov statistic also came up.

These metrics look like a good way to evaluate how a model is doing against the baseline no model/no information rate.

The Options

These three charts all order the predictions by their predicted probability of belonging to the positive class in descending order. They are then broken into bins, usually deciles.

Cumulative gains

This plots the proportion of all observations in the bin (remember the observations are ordered by likelihood that they belong in the positive class) against the proportion of positive class in the bin. Because it’s cumulative, all data in lower deciles are included in higher deciles. This means that by the 10th decile (i.e., 100% of the data) the proportion of the positive class will be 100%.

Basically, in the first 10% of data (the 0 to 0.1 decile, which includes the observations with the highest probability of being in the positive class) there are X observations (i.e., 10% of all observations). Of those Y are in the positive class, which makes up Z percent of all positive class observations.

Now for some numbers to make this more concrete.

  • I have a dataset of 100 observations (there are 10 observations in each of my bins)
  • In my full dataset there are 15 positive observations
  • 5 of those positive class observations are in the first decile

With the above information, we can see that the first decile holds 30% of the positive class. Within the full dataset the positive class only makes up 15% of my data. As such, the model did 15% better than if I’d used no model.

Lift chart

The lift chart works pretty much directly on the cumulative gains data. It shows how much better the model preformed than if no model had been used. For the example above, the lift for the first decile would be 2 because it identified twice (i.e. two times: 15% to 30%) as many of the positive class than random guessing would have.

Kolmogorov-Smirnov statistic

The Kolmogorov-Smirnov (KS) statistic calculates the difference between the cumulative target and the cumulative non-target (kind of like the residual error, which returns the difference between the mean and the observed value for regression problems (see Entry 21 for more information on residual error). This difference in value indicates how good the model is at separating the two classes.

As noted in MODEL EVALUATION – CLASSIFICATION MODELS:

A K-S statistic=100 will mean that the model is able to create two mutually exclusive groups with each group having a separate class label of observations. K-S Statistic=0 indicates a very poor model which fails to successfully distinguish between the classes.

For a good model, the maximum K-S statistic should fall in the top three or four deciles as we expect the maximum differentiation […] to happen in the initial deciles only.

The Proposed Solution

At first I wasn’t sure these metrics would be helpful for my use case. Cumulative gains and lift would be okay at showing the value added by using a model. Which would be fine to show upper leadership to quantify the value add of the machine learning team. Meaningful Metrics: Cumulative Gains and Lyft Charts points out that the AUC metrics tend to be a little abstract for non-technical managers.

However, my department already has a legacy system in place, so I’d have to hack the baseline to be the results of the legacy system. Meh. We already have several metrics we track to show value.

Then I got to the KS statistic. For the data at my workplace, a human has to review each observation before a decision is made on that observation. In these situations, we only want to tag a limited number of observations for human review.

It’s like the churn example given in MODEL EVALUATION – CLASSIFICATION MODELS. If a human has to call each positive prediction (the customers the model thinks are likely to leave the company), then you want to put the customers most likely to churn in front of the humans calling the customers. Same for lead generation (lead generation is when there is a list of potential customers that someone calls to see if that potential customer is interested in a product or service). One person is highly unlikely to get through 100,000 leads in a day. They need to focus on the 10 most likely to pan out and turn into a sale.

The KS statistic turns the model’s ability to separate the classes into a hard number that can be viewed at different proportions of the total population. If we have the capacity to look at 20% of the total observations, then we can see how well the model is able to differentiate the classes at that percentage.

The Fail

I’m not exactly sure how I’d want to use this, but it’s good to have in my back pocket.

Okay, so this second point is the opposite of a fail but … this post only took around 2 hours. I finally met my time goal of 1-2 hours on the first try! (Yes, I’m pretty sure this is the first time that’s happened. Yes, I’m aware that this is entry number 28.) I even created three notebooks to go with this that looked at the results on three different datasets.

Most of my posts, especially the ones around classification metrics, have been running way longer, and end up getting broken into multiple posts. Even dividing the time spent by the number of posts I end up with probably doesn’t get me down to 2 hours per post. So, yay! Finally!

Maybe this will be the start of a trend! (I’m totally not holding my breath for that.)

Up Next

Profit and Cost

Resources