Another possibility that requires more cooperation from other groups is to set up Team-Fly®
470643 c18.qxd 3/8/04 11:31 AM Page 603
Putting Data Mining to Work 603
a controlled experiment comparing the effects of the actions taken based on data mining with the current baseline. Such a controlled experiment is particularly valuable in a company that already has a culture of doing such experiments.
Finally, use the results of modeling (whether from historical testing or an actual experiment) to build a business case for integrating data mining into the business operations on a permanent basis.
Sometimes, the result of the pilot project is insight into customers and the market. In this case, success is determined more subjectively, by providing insight to business people. Although this might seem the easier proof-of-concept project, it is quite challenging to find results in a span of weeks that make a favorable impression on business people with years of experience.
Many data mining proof-of-concept projects are not ambitious because they are designed to assess the technology rather than the results of its application.
It is best when the link between better models and better business results is not hypothetical, but is demonstrated by actual results. Statisticians and analysts may be impressed by theoretical results; senior management is not.
A graph showing the lift in response rates achieved by a new model on a test dataset is impressive; however, new customers gained because of the model are even more impressive.
Measure the Results of the Actions
It is important to measure both the effectiveness of the data mining models themselves and the actual impact on the business of the actions taken as a result of the models’ predictions.
Lift is an appropriate way to measure the effectiveness of the models themselves. Lift measures the change in concentration of records of some particular type (such as responders or defaulters) relative to model scores. To measure the impact on the business requires more information. If the pilot project builds a response model, keep track of the following costs and benefits:
■■
What is the fixed cost of setting up the campaign and the model that supports it?
■■
What is the cost per recipient of making the offer?
■■
What is the cost per respondent of fulfilling the offer?
■■
What is the value of a positive response?
The last item seems obvious, but is often overlooked. We have seen more than one data mining initiative get bogged down because, although it was shown that data mining could reach more customers, there was no clear model of what a new customer was worth and therefore no clear understanding of the benefits to be derived.
470643 c18.qxd 3/8/04 11:31 AM Page 604
604 Chapter 18
Although the details of designing a good marketing test are beyond the scope of this book, it is important to control for both the efficacy of the data mining model and the efficacy of the offer or message employed. This can be accomplished by tracking the response of four different groups:
■■
Group A, selected to receive the offer by the data mining model
■■
Group B, selected at random to receive the same offer
■■
Group C, also selected at random, that does not get the offer
■■
Group D, selected by the model to receive the offer, but does not get it.
If the model does a good job of finding the right customers, group A will respond at a significantly higher rate than group B. If the offer is effective, group B will respond at a higher rate than group C. Sometimes, a model does a good job of finding responders for an ineffective offer. In such a case, groups A and D have similar response rates. Each pair-wise comparison answers a different question, as shown in Figure 18.1.
How well does model work
for measuring response?
Random & Included
Modeled & Included
(Group B)
(Group A)
Randomly Selected Customers High Model Score Customers
Included in Campaign
Included in Campaign
How well does
How well does
message work
message work
on random
on modeled
customers?
customers?
Random & Excluded Modeled & Excluded
(Group C)
(Group D)
Randomly Selected Customers High Model Score Customers
Excluded from Campaign
Excluded from Campaign
How well does model work
for measuring propensity?
Figure 18.1 Tracking four different groups makes it possible to determine both the effect of the campaign and the effect of the model.
470643 c18.qxd 3/8/04 11:31 AM Page 605
Putting Data Mining to Work 605
This latter situation does occur. One Canadian bank used a model to pick customers who should be targeted with a direct mail campaign to open investment accounts. The people picked by the model did, in fact, open investment accounts at a higher rate than other customers—whether or not they received the promotional material. In this case there is a simple reason. The bank had flooded its customers with messages about investment accounts—advertising, posters in branches, billing inserts, and messages when customers called in and were put on hold. Against this cacophony of messages, the direct mail piece was redundant.
Choosing a Data Mining Technique
The choice of which data mining technique or techniques to apply depends on the particular data mining task to be accomplished and on the data available for analysis. Before deciding on a data mining technique, first translate the business problem to be addressed into a series of data mining tasks and understand the nature of the available data in terms of the content and types of the data fields.
Formulate the Business Goal as a Data Mining Task
The first step is to take a business goal such as “improve retention” and turn it into one or more of the data mining tasks from Chapter 1. As a reminder, the six basic tasks addressed by the data mining techniques discussed in this book are:
■■
Classification
■■
Estimation
■■
Prediction
■■
Affinity grouping
■■
Clustering
■■
Profiling and description
One approach to the business goal of improving retention is to identify the subscribers who are likely to cancel, figure out why, and make them some kind of offer that addresses their concerns. For the strategy to be successful, subscribers who are likely to cancel must be identified and assigned to groups according to their presumed reasons for leaving. An appropriate retention offer can then be designed for each group.
Using a model set that contains examples of customers who have canceled along with examples of those who have not, many of the data mining techniques discussed in this book are capable of labeling each customer as more or
470643 c18.qxd 3/8/04 11:31 AM Page 606
606 Chapter 18
less likely to churn. The additional requirement to identify separate segments of subscribers at risk and understand what motivates each group to leave suggests the use of decision trees and clever derived variables.
Each leaf of the decision tree has a label, which in this case would be “not likely to churn” or “likely to churn.” Each leaf in the tree has different proportions of the target variables; this proportion of churners that can be used as a churn score. Each leaf also has a set of rules describing who ends up there. With skill and creativity, an analyst may be able to turn these mechanistic rules into comprehensible reasons for leaving that, once understood, can be counteracted.
Decision trees often have more leaves than desired for the purpose of developing special offers and telemarketing scripts. To combine leaves, into larger groups, take whole branches of the tree as the groups, rather than single leaves.
Note that our preference for decision-tree methods in this case stems from the desire to understand the reasons for attrition and our desire to treat subgroups differentially. If the goal were simply to do the best possible job of predicting the subscribers at risk, without worrying about the reasons, we might select a different approach. Different business goals suggest different data mining techniques. If the goal were to estimate next month’s minutes of use for each subscriber, neural networks or regression would be better choices. If the goal were to find naturally occurring customer segments an undirected clustering technique or profiling and hypothesis testing would be appropriate.
Determine the Relevant Characteristics of the Data
Once the data mining tasks have been identified and used to narrow the range of data mining methods under consideration, the characteristics of the available data can help to refine the selection further. In general terms, the goal is to select the data mining technique that minimizes the number and difficulty of the data transformations that must be performed in order to coax good results from the data.
As discussed in the previous chapter, some amount of data transformation is always part of the data mining process. The raw data may need to be summarized in various ways, data encodings must be rationalized, and so forth.
These kinds of transformations are necessary regardless of the technique chosen. However, some kinds of data pose particular problems for some data mining techniques.