However, this greatly reduces the amount of data contributing to the curve.
T I P Instead of using retention curves, use survival curves. That is, first calculate the hazards and then work back to calculate the survival curve.
The survival curve, on the other hand, looks at as many customers as possible, not just the ones who started exactly n time periods ago. The survival at any given point in time t uses information from all customers. The hazard at time t uses information from all customers whose tenure is greater than or equal to that value (assuming all are in the population at risk). Survival, though, is calculated by combining all the information for hazards from smaller values of t.
Because survival calculations use all the data, the values are more stable than retention calculations. Each point on a retention curve limits customers to having started at a particular point in time. Also, because a survival curve always slopes downward, calculations of customer half-life and average customer tenure are more accurate. By incorporating more information, survival provides a more accurate, smoother picture of customer retention.
When analyzing customers, both hazards and survival provide valuable information about customers. Because survival is cumulative, it gives a good summary value for comparing different groups of customers: How does the 1-year survival compare among different groups? Survival is also used for calculating customer half-life and mean customer tenure, which in turn feed into other calculations, such as customer value.
Because survival is cumulative, it is difficult to see patterns at a particular point in time. Hazards make the specific causes much more apparent. When discussing some real-world hazards, it was possible to identify events during the customer life cycle that were drivers of hazards. Survival curves do not highlight such events as clearly as hazards do.
The question may also arise about comparing hazards for different groups of customers. It does not make sense to compare average hazards over a period of time. Mathematically, “average hazard” does not make sense. The right approach is to turn the hazards into survival and compare the values on the survival curves.
The description of hazards and survival presented so far differs a bit from how the subject is treated in statistics. The sidebar “A Note about Survival Analysis and Statistics” explains the differences further.
470643 c12.qxd 3/8/04 11:17 AM Page 408
408 Chapter 12
A NOTE ABOUT SURVIVAL ANALYSIS AND STATISTICS
The discussion of survival analysis in this chapter assumes that time is discrete.
In particular, things happen on particular days, and the particular time of day is not important. This is not only reasonable for the problems addressed by data mining, but it is also more intuitive and simplifies the mathematics.
In statistics, though, survival analysis makes the opposite assumption, that time is continuous. Instead of hazard probabilities, statisticians work with hazard rates, which are turned into survival curves by using exponentiation and integration. One difference between a rate and a probability is that the rate can exceed 1, whereas a probability never does. Also, a rate seems less intuitive for many survival problems encountered with customers.
The method for calculating hazards in this chapter is called the life table method, and it works well with discrete time data. A very similar method, called Kaplan-Meier, is used for continuous time data. The two techniques produce almost exactly the same results when events occur at discrete times.
An important part of statistical survival analysis is the estimation of hazards using parameterized regression—trying to find the best functional form for the hazards. This is an alternative approach, calculating the hazards directly from the data.
The parameterized approach has the important advantage that it can more easily include covariates in the process. Later in this chapter, there is an example based on such a parameterized model. Unfortunately, the hazard function rarely follows a form that would be familiar to nonstatisticians. The hazards do such a good job of describing the customer life cycle that it would be shocking if a simple function captured that rich complexity.
We strongly encourage interested readers who have a mathematical or statistical background to investigate the area further.
Proportional Hazards
Sir David Cox is one of the most cited statisticians of the past century; his work comprises numerous books and over 250 articles. He has received many awards including a knighthood bestowed on him by Queen Elizabeth in 1985.
Much of his research centered on understanding hazard functions, and his work has been particularly important in the world of medical research.
His seminal paper was about determining the effect of initial factors (timezero covariates) on hazards. By assuming that these initial factors have a uniform proportional effect on hazards, he was able to figure out how to measure this effect for different factors. The purpose of this section is to introduce proportional hazards and to suggest how they are useful for understanding customers. This section starts with some examples of why proportional
470643 c12.qxd 3/8/04 11:17 AM Page 409
Hazard Functions and Survival Analysis in Marketing 409
hazards are useful. It then describes an alternative approach before returning to the Cox model itself.
Examples of Proportional Hazards
Consider the following statement about one risk from smoking: The risk of leukemia for smokers is 1.53 times greater than for nonsmokers. This result is a classic example of proportional hazards. At the time of the study, the researchers knew whether someone was or was not a smoker (actually, there was a third group of former smokers, but our purpose here is to illustrate an example).
Whether or not someone is a smoker is an example of an initial condition.
Since there are only two factors to consider, it is possible to just look at the hazard curves and to derive some sort of average for the overall risk.
Figure 12.11 provides an illustration from the world of marketing. It shows two sets of hazard probabilities, one for customers who joined from a telephone solicitation and the other from direct mail. Once again, how someone became a customer is an example of an initial condition. The hazards for the telemarketing customers are higher; looking at the chart, we might say telemarketing customers are a bit less than twice as risky as direct mail customers.
Cox proportional hazard regression provides a way to quantify this.
The two just-mentioned examples use categorical variables as the risk factor.
Consider another statement about the risk of tobacco : The risk of colorectal cancer increases 6.7 percent per pack-year smoked. This statement differs from the previous one, because it now depends on a continuous variable. Using proportional hazards, it is possible to determine the contribution of both categorical and continuous covariates.
20%
18%
16%
14%
12%
d
10%
8%
Hazar
Telemarketing
6%
Direct Mail
4%
2%
0%
0
10
20
30
40
50
60
70
Tenure (Weeks)
Figure 12.11 These two hazard functions suggest that the risk of attrition is about one and a half times as great for customers acquired through telemarketing versus direct mail.
470643 c12.qxd 3/8/04 11:17 AM Page 410
410 Chapter 12
Stratification: Measuring Initial Effects on Survival
Figure 12.11 showed hazard probabilities for two different groups of customers, one that started via outbound telemarketing campaigns and the other via direct mail campaigns. These two curves clearly show differences between these channels. It is possible to generate a survival curve for these hazards and quantify the difference, using 1-year survival, median survival, or average truncated tenure. This approach to measuring differences among different groups defined by initial conditions is called stratification because each group is analyzed independently from other groups. This produces good visualizations and accurate survival values. It is also quite easy, since statistical packages such as SAS and SPSS have options that make it easy to stratify data for this purpose.
Stratification solves the problem of understanding initial effects assuming that two conditions are true. First, the initial effect needs to be a categorical variable. Since the data is being broken into separate groups, some variable, such as channel or product or region, needs to be chosen for this purpose. Of course, it is always possible to use binning to break a continuous variable into discrete chunks.
The second is that each group needs to be fairly big. When starting with lots and lots of customers and only using one variable that takes on a handful of values, such as channel, this is not a problem. However, there may be multiple variables of interest, such as:
■■
Acquisition channel
■■
Original promotion
■■
Geography
Once more than one dimension is included, the number of categories grows very quickly. This means that the data gets spread thinly, making the hazards less and less reliable.
Cox Proportional Hazards
In 1972, Sir David Cox recognized this problem and he proposed a method of analysis, now known as Cox proportional hazards regression, which overcomes these limitations. His brilliant insight was to find a way to focus on the original conditions and not on the hazards themselves. The question is: What effect do the initial conditions have on hazards? His approach to answering this question is quite interesting.