A more likely scenario is that the marketing budget does not allow the same level of engagement with every prospect. Consider a company with 1 million names on its prospect list and $300,000 to spend on a marketing campaign that has a cost of one dollar per contact. This company, which we call the Simplifying Assumptions Corporation (or SAC for short), can maximize the number of responses it gets for its $300,000 expenditure by scoring the prospect list with a response model and sending its offer to the prospects with the top 300,000
scores. The effect of this action is illustrated in Figure 4.2.
470643 c04.qxd 3/8/04 11:10 AM Page 98
98
Chapter 4
ROC CURVES
Models are used to produce scores. When a cutoff score is used to decide which customers to include in a campaign, the customers are, in effect, being classified into two groups—those likely to respond, and those not likely to respond. One way of evaluating a classification rule is to examine its error rates. In a binary classification task, the overall misclassification rate has two components, the false positive rate, and the false negative rate. Changing the cutoff score changes the proportion of the two types of error. For a response model where a higher score indicates a higher liklihood to respond, choosing a high score as the cutoff means fewer false positive (people labled as responders who do not respond) and more false negatives (people labled as nonresponders who would respond).
An ROC curve is used to represent the relationship of the false-positive rate to the false-negative rate of a test as the cutoff score varies. The letters ROC
stand for “Receiver Operating Characteristics” a name that goes back to the curve’s origins in World War II when it was developed to assess the ability of radar operators to identify correctly a blip on the radar screen , whether the blip was an enemy ship or something harmless. Today, ROC curves are more likely to used by medical researchers to evaluate medical tests. The false positive rate is plotted on the X-axis and one minus the false negative rate is plotted on the Y-axis. The ROC curve in the following figure
ROC Chart
100
90
80
70
60
50
40
30
20
10
0
0
20
40
60
80
100
470643 c04.qxd 3/8/04 11:10 AM Page 99
Data Mining Applications
99
ROC CURVES (continued)
Reflects a test with the error profile represented by the following table: FN
0
2
4
8
12
22
32
46
60
80 10
0
FP
100
72
44
30
16
11
6
4
2
1
0
Choosing a cutoff for the model score such that there are very few false positives, leads to a high rate of false negatives and vice versa. A good model (or medical test) has some scores that are good at discriminating between outcomes, thereby reducing both kinds of error. When this is true, the ROC
curve bulges towards the upper-left corner. The area under the ROC curve is a measure of the model’s ability to differentiate between two outcomes. This measure is called discrimination. A perfect test has discrimination of 1 and a useless test for two outcomes has discrimination 0.5 since that is the area under the diagonal line that represents no model.
ROC curves tend to be less useful for marketing applications than in some other domains. One reason is that the false positive rates are so high and the false negative rates so low that even a large change in the cutoff score does not change the shape of the curve much.
100%
90%
80%
70%
60%
50%
Benefit
40%
ation (% of Responders) 30%
20%
Response Model
Concentr 10%
No Model
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
List Penetration (% of Prospects)
Figure 4.2 A cumulative gains or concentration chart shows the benefit of using a model.
470643 c04.qxd 3/8/04 11:10 AM Page 100
100 Chapter 4
The upper, curved line plots the concentration, the percentage of all responders captured as more and more of the prospects are included in the campaign.
The straight diagonal line is there for comparison. It represents what happens with no model so the concentration does not vary as a function of penetration.
Mailing to 30 percent of the prospects chosen at random would find 30 percent of the responders. With the model, mailing to the top 30 percent of prospects finds 65 percent of the responders. The ratio of concentration to penetration is the lift. The difference between these two lines is the benefit. Lift was discussed in the previous chapter. Benefit is discussed in a sidebar.
The model pictured here has lift of 2.17 at the third decile, meaning that using the model, SAC will get twice as many responders for its expenditure of $300,000 than it would have received by mailing to 30 percent of its one million prospects at random.
Optimizing Campaign Profitability
There is no doubt that doubling the response rate to a campaign is a desirable outcome, but how much is it actually worth? Is the campaign even profitable?
Although lift is a useful way of comparing models, it does not answer these important questions. To address profitability, more information is needed. In particular, calculating profitability requires information on revenues as well as costs. Let’s add a few more details to the SAC example.
The Simplifying Assumptions Corporation sells a single product for a single price. The price of the product is $100. The total cost to SAC to manufacture, warehouse and distribute the product is $55 dollars. As already mentioned, it costs one dollar to reach a prospect. There is now enough information to calculate the value of a response. The gross value of each response is $100. The net value of each response takes into account the costs associated with the response ($55 for the cost of goods and $1 for the contact) to achieve net revenue of $44 per response. This information is summarized in Table 4.3.
Table 4.3 Profit/Loss Matrix for the Simplifying Assumptions Corporation MAILED
RESPONDED
Yes
No
Yes
$44
$–1
No
$0
$0
470643 c04.qxd 3/8/04 11:10 AM Page 101
Data Mining Applications 101
BENEFIT
Concentration charts, such as the one pictured in Figure 4.2, are usually discussed in terms of lift. Lift measures the relationship of concentration to penetration and is certainly a useful way of comparing the performance of two models at a given depth in the prospect list. However, it fails to capture another concept that seems intuitively important when looking at the chart—namely, how far apart are the lines, and at what penetration are they farthest apart?
Our colleague, the statistician Will Potts, g
benefit
ives the name
to the
difference between concentration and penetration. Using his nomenclature, the point where this difference is maximized is the point of maximum benefit. Note that the point of maximum benefit does not correspond to the point of highest lift. Lift is always maximized at the left edge of the concentration chart where the concentration is highest and the slope of the curve is steepest.
The point of maximum benefit is a bit more interesting. To explain some of its useful properties this sidebar makes reference to some things (such ROC
curves and KS tests) that are not explained in the main body of the book. Each bulleted point is a formal statement about the maximum benefit point on the concentration curve. The formal statements are followed by informal explanations.
◆ The maximum benefit is proportional to the maximum distance between the cumulative distribution functions of the probabilities in each class.
What this means is that the model score that cuts the prospect list at the penetration where the benefit is greatest is also the score that maximizes the Kolmogorov-Smirnov (KS) statistic. The KS test is popular among some statisticians, especially in the financial services industry. It was developed as a test of whether two distributions are different. Splitting the list at the point of maximum benefit results in a “good list” and a “bad list” whose distributions of responders are maximally separate from each other and from the population. In this case, the “good list” has a maximum proportion of responders and the “bad list” has a minimum proportion.
◆ The maximum benefit point on the concentration curve corresponds to the maximum perpendicular distance between the corresponding ROC
curve and the no-model line.
The ROC curve resembles the more familiar concentration or cumulative gains chart, so it is not surprising that there is a relationship between them. As explained in another sidebar, the ROC curve shows the trade-off between two types of misclassification error. The maximum benefit point on the cumulative gains chart corresponds to a point on the ROC curve where the separation between the classes is maximized.
◆ The maximum benefit point corresponds to the decision rule that maximizes the unweighted average of sensitivity and specificity.
(continued)
470643 c04.qxd 3/8/04 11:10 AM Page 102