A Real-World Example
Figure 12.6 shows a real-world example of a hazard function, for a company that sells a subscription-based service (the exact service is unimportant). This hazard function is measuring the probability of a customer stopping a given number of weeks after signing on.
There are several interesting characteristics about the curve. First, it starts high. These are customers who sign on, but are not able to be started for some technical reason such as their credit card not being approved. In some cases, customers did not realize that they had signed on—a problem that the authors encounter most often with outbound telemarketing campaigns.
Next, there is an M-shaped feature, with peaks at about 9 and 11 weeks. The first of these peaks, at about 2 months, occurs because of nonpayment. Customers who never pay a bill, or who cancel their credit card charges, are stopped for nonpayment after about 2 months. Since a significant number of customers leave at this time, the hazard probability spikes up.
7%
6%
5%
d
4%
y Hazar
3%
eeklW 2%
1%
0%
0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
68
72
76
Tenure (Weeks after Start)
Figure 12.6 A subscription business has customer hazard probabilities that look like this.
470643 c12.qxd 3/8/04 11:17 AM Page 399
Hazard Functions and Survival Analysis in Marketing 399
The second peak in the “M” is coincident with the end of the initial promotion that offers introductory pricing. This promo typically lasts for about 3 months, and then customers have to start paying full price. Many decide that they no longer really want the service. It is quite possible that many of these customers reappear to take advantage of other promotions, an interesting fact not germane to this discussion on hazards but relevant to the business.
After the first 3 months, the hazard function has no more really high peaks.
There is a small cycle of peaks, about every 4 or 5 weeks. This corresponds to the monthly billing cycle. Customers are more likely to stop just after they receive a bill.
The chart also shows that there is a gentle decline in the hazard rate. This decline is a good thing, since it means that the longer a customers stays around, the less likely the customer is to leave. Another way of saying this is that customers are becoming more loyal the longer they stay with the company.
Censoring
So far, this introduction to hazards has glossed over one of the most important concepts in survival analysis: censoring. Remember the definition of a hazard probability, the number of stops at a given time t divided by the population at that time. Clearly, if a customer has stopped before time t, then that customer is not included in the population count. This is most basic example of censoring.
Customers who have stopped are not included in calculations after they stop.
There is another example of censoring, although it is a bit subtler. Consider customers whose tenure is t but who are currently active. These customers are not included in the population for the hazard for tenure t, because the customers might still stop before t+1—here today, gone tomorrow. These customers have been dropped out of the calculation for that particular hazard, although they are included in calculations of hazards for smaller values of t. Censoring—dropping some customers from some of the hazard calculations—proves to be a very powerful technique, important to much of survival analysis.
Let’s look at this with a picture. Figure 12.7 shows a set of customers and what happens at the beginning and end of their relationship. In particular, the end is shown with a small circle that is either open or closed. When the circle is open, the customer has already left and their exact tenure is known since the stop date is known.
A closed circle means that the customer has survived to the analysis date, so the stop date is not yet known. This customer—or in particular, this customer’s tenure—is censored. The tenure is at least the current tenure, but most likely larger. How much larger is unknown, because that customer’s exact stop date has not yet happened.
470643 c12.qxd 3/8/04 11:17 AM Page 400
400 Chapter 12
time
Figure 12.7 In this group of customers who all start at different times, some customers are censored because they are still active.
Let’s walk through the hazard calculation for these customers, paying particular attention to the role of censoring. When looking at customer data for hazard calculations, both the tenure and the censoring flag are needed. For the customers in Figure 12.7, Table 12.2 shows this data.
It is instructive to see what is happening during each time period. At any point in time, a customer might be in one of three states: ACTIVE, meaning that the relationship is still ongoing; STOPPED, meaning that the customer stopped during that time interval; or CENSORED, meaning that the customer is not included in the calculation. Table 12.3 shows what happens to the customers during each time period.
Table 12.2 Tenure Data for Several Customers
CUSTOMER CENSORED TENURE5
2
N
4
3
N
3
4
Y
3
5
N
2
6
Y
1
7
N
1
470643 c12.qxd 3/8/04 11:17 AM Page 401
Hazard Functions and Survival Analysis in Marketing 401
ED
ED
ED
ED
ED
ED
5 E
SOR
SOR
SOR
SOR
SOR
SOR
IM
CTIVE
T
A
CEN
CEN
CEN
CEN
CEN
CEN
ED
ED
ED
ED
ED
4
ED
E
P
SOR
SOR
SOR
SOR
SOR
IM
CTIVE
T
A
STOP
CEN
CEN
CEN
CEN
CEN
ED
ED
ED
3
ED
E
P
SOR
SOR
SOR
IM
CTIVE
CTIVE
CTIVE
T
A
A
STOP
A
CEN
CEN
CEN
ED
ED
2
ED
E
P
SOR
SOR
IM
CTIVE
CTIVE
CTIVE
CTIVE
T
A
A
A
A
STOP
CEN
CEN
1
ED
E
P
IM
CTIVE
CTIVE
CTIVE
CTIVE
CTIVE
CTIVE
T
A
A
A
A
A
A
STOP
0 E
IM
CTIVE
CTIVE
CTIVE
CTIVE
CTIVE
CTIVE
CTIVE
T
A
A
A
A
A
A
A
eriods
E
ime P
IMT
IFEL
5
4
3
3
2
1
1
ORED
CENS
Y
N
N
Y
N
Y
N
Tracking Customers over Several T
ER
2.3
OMT
CUS
1
2
3
4
5
6
7
Table 1
470643 c12.qxd 3/8/04 11:17 AM Page 402
402 Chapter 12
Table 12.4 From Times to Hazards
TIME 0
TIME 1
TIME 2
TIME 3
TIME 4
TIME 5
ACTIVE
7
6
4
3
1
1
STOPPED
0
1
1
1
1
0
CENSORED
0
0
2
3
5
5
HAZARD
0%
14%
20%
25%
50%
0%
Notice in Table 12.4 that the censoring takes place one time unit later than the lifetime. That is, Customer #1 survived to Time 5, what happens after that is unknown. The hazard at a given time is the number of customers who are STOPPED divided by the total of the customers who are either ACTIVE or STOPPED.
The hazard for Time 1 is 14 percent, since one out of seven customers stop at this time. All seven customers survived to time 1 and all could have stopped.
Of these, only one did. At TIME 2, there are five customers left—Customer #7
has already stopped, and Customer #6 has been censored. Of these five, one stops, for a hazard of 20 percent. And so on. This example has shown how to calculate hazard functions, taking into account the fact that some (hopefully many) customers have not yet stopped.
TEAMFLY
This calculation also shows that the hazards are highly erratic—jumping from 25 percent to 50 percent to 0 percent in the last 3 days. Typically, hazards do not vary so much. This erratic behavior arises only because there are so few customers in this simple example. Similarly, lining up customers in a table is useful for didactic purposes to demonstrate the calculation on a manageable set of data. In the real world, such a presentation is not feasible, since there are likely to be thousands or millions of customers going down and hundreds or thousands of days going across.
It is also worth mentioning that this treatment of hazards introduces them as conditional probabilities, which vary between 0 and 1. This is possible because the hazards are using time that is in discrete units, such as days or week, a description of time applicable to customer-related analyses. However, statisticians often work with hazard rates rather than probabilities. The ideas are clearly very related, but the mathematics using rates involves daunting integrals, complicated exponential functions, and difficult to explain adjustments to this or that factor. For our purposes, the simpler hazard probabilities are not only easier to explain, but they also solve the problems that arise when working with customer data.
Other Types of Censoring
The previous section introduced censoring in two cases: hazards for customers after they have stopped and hazards for customers who are still active. There Team-Fly®