Survival analysis is explained in Chapter 12. The basic idea is to calculate for each customer (or for each group of customers that share the same values for model input variables such as geography, credit class, and acquisition channel) the probability that having made it as far as today, he or she will leave before tomorrow. For any one tenure this hazard, as it is called, is quite small, but it is higher for some tenures than for others. The chance that a customer will survive to reach some more distant future date can be calculated from the intervening hazards.
Lessons Learned
The data mining techniques described in this book have applications in fields as diverse as biotechnology research and manufacturing process control. This book, however, is written for people who, like the authors, will be applying these techniques to the kinds of business problems that arise in marketing and customer relationship management. In most of the book, the focus on customer-centric applications is implicit in the choice of examples used to illustrate the techniques. In this chapter, that focus is more explicit.
Data mining is used in support of both advertising and direct marketing to identify the right audience, choose the best communications channels, and pick the most appropriate messages. Prospective customers can be compared to a profile of the intended audience and given a fitness score. Should information on individual prospects not be available, the same method can be used
470643 c04.qxd 3/8/04 11:10 AM Page 121
Data Mining Applications 121
to assign fitness scores to geographic neighborhoods using data of the type available form the U.S. census bureau, Statistics Canada, and similar official sources in many countries.
A common application of data mining in direct modeling is response modeling. A response model scores prospects on their likelihood to respond to a direct marketing campaign. This information can be used to improve the response rate of a campaign, but is not, by itself, enough to determine campaign profitability. Estimating campaign profitability requires reliance on estimates of the underlying response rate to a future campaign, estimates of average order sizes associated with the response, and cost estimates for fulfillment and for the campaign itself. A more customer-centric use of response scores is to choose the best campaign for each customer from among a number of competing campaigns. This approach avoids the usual problem of independent, score-based campaigns, which tend to pick the same people every time.
It is important to distinguish between the ability of a model to recognize people who are interested in a product or service and its ability to recognize people who are moved to make a purchase based on a particular campaign or offer. Differential response analysis offers a way to identify the market segments where a campaign will have the greatest impact. Differential response models seek to maximize the difference in response between a treated group and a control group rather than trying to maximize the response itself.
Information about current customers can be used to identify likely prospects by finding predictors of desired outcomes in the information that was known about current customers before they became customers. This sort of analysis is valuable for selecting acquisition channels and contact strategies as well as for screening prospect lists. Companies can increase the value of their customer data by beginning to track customers from their first response, even before they become customers, and gathering and storing additional information when customers are acquired.
Once customers have been acquired, the focus shifts to customer relationship management. The data available for active customers is richer than that available for prospects and, because it is behavioral in nature rather than simply geographic and demographic, it is more predictive. Data mining is used to identify additional products and services that should be offered to customers based on their current usage patterns. It can also suggest the best time to make a cross-sell or up-sell offer.
One of the goals of a customer relationship management program is to retain valuable customers. Data mining can help identify which customers are the most valuable and evaluate the risk of voluntary or involuntary churn associated with each customer. Armed with this information, companies can target retention offers at customers who are both valuable and at risk, and take steps to protect themselves from customers who are likely to default.
470643 c04.qxd 3/8/04 11:10 AM Page 122
122 Chapter 4
From a data mining perspective, churn modeling can be approached as either a binary-outcome prediction problem or through survival analysis.
There are advantages and disadvantages to both approaches. The binary outcome approach works well for a short horizon, while the survival analysis approach can be used to make forecasts far into the future and provides insight into customer loyalty and customer value as well.
TEAMFLY
Team-Fly®
470643 c05.qxd 3/8/04 11:11 AM Page 123
C H A P T E R
5
The Lure of Statistics: Data
Mining Using Familiar Tools
For statisticians (and economists too), the term “data mining” has long had a pejorative meaning. Instead of finding useful patterns in large volumes of data, data mining has the connotation of searching for data to fit preconceived ideas. This is much like what politicians do around election time—search for data to show the success of their deeds; this is certainly not what we mean by data mining! This chapter is intended to bridge some of the gap between statisticians and data miners.
The two disciplines are very similar. Statisticians and data miners commonly use many of the same techniques, and statistical software vendors now include many of the techniques described in the next eight chapters in their software packages. Statistics developed as a discipline separate from mathematics over the past century and a half to help scientists make sense of observations and to design experiments that yield the reproducible and accurate results we associate with the scientific method. For almost all of this period, the issue was not too much data, but too little. Scientists had to figure out how to understand the world using data collected by hand in notebooks.
These quantities were sometimes mistakenly recorded, illegible due to fading and smudged ink, and so on. Early statisticians were practical people who invented techniques to handle whatever problem was at hand. Statisticians are still practical people who use modern techniques as well as the tried and true.
123
470643 c05.qxd 3/8/04 11:11 AM Page 124
124 Chapter 5
What is remarkable and a testament to the founders of modern statistics is that techniques developed on tiny amounts of data have survived and still prove their utility. These techniques have proven their worth not only in the original domains but also in virtually all areas where data is collected, from agriculture to psychology to astronomy and even to business.
Perhaps the greatest statistician of the twentieth century was R. A. Fisher, considered by many to be the father of modern statistics. In the 1920s, before the invention of modern computers, he devised methods for designing and analyzing experiments. For two years, while living on a farm outside London, he collected various measurements of crop yields along with potential explanatory variables—amount of rain and sun and fertilizer, for instance. To understand what has an effect on crop yields, he invented new techniques (such as analysis of variance—ANOVA) and performed perhaps a million calculations on the data he collected. Although twenty-first-century computer chips easily handle many millions of calculations in a second, each of Fisher’s calculations required pulling a lever on a manual calculating machine. Results trickled in slowly over weeks and months, along with sore hands and calluses.
The advent of computing power has clearly simplified some aspects of analysis, although its bigger effect is probably the wealth of data produced. Our goal is no longer to extract every last iota of possible information from each rare datum. Our goal is instead to make sense of quantities of data so large that they are beyond the ability of our brains to comprehend in their raw format.
The purpose of this chapter is to present some key ideas from statistics that have proven to be useful tools for data mining. This is intended to be neither a thorough nor a comprehensive introduction to statistics; rather, it is an introduction to a handful of useful statistical techniques and ideas. These tools are shown by demonstration, rather than through mathematical proof.
The chapter starts with an introduction to what is probably the most important aspect of applied statistics—the skeptical attitude. It then discusses looking at data through a statistician’s eye, introducing important concepts and terminology along the way. Sprinkled through the chapter are examples, especially for confidence intervals and the chi-square test. The final example, using the chi-square test to understand geography and channel, is an unusual application of the ideas presented in the chapter. The chapter ends with a brief discussion of some of the differences between data miners and statisticians—differences in attitude that are more a matter of degree than of substance.