Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

A Real-World Example

Figure 12.6 shows a real-world example of a hazard function, for a company that sells a subscription-based service (the exact service is unimportant). This hazard function is measuring the probability of a customer stopping a given number of weeks after signing on.

There are several interesting characteristics about the curve. First, it starts high. These are customers who sign on, but are not able to be started for some technical reason such as their credit card not being approved. In some cases, customers did not realize that they had signed on—a problem that the authors encounter most often with outbound telemarketing campaigns.

Next, there is an M-shaped feature, with peaks at about 9 and 11 weeks. The first of these peaks, at about 2 months, occurs because of nonpayment. Customers who never pay a bill, or who cancel their credit card charges, are stopped for nonpayment after about 2 months. Since a significant number of customers leave at this time, the hazard probability spikes up.

7%

6%

5%

d

4%

y Hazar

3%

eeklW 2%

1%

0%

0

4

8

12

16

20

24

28

32

36

40

44

48

52

56

60

64

68

72

76

Tenure (Weeks after Start)

Figure 12.6 A subscription business has customer hazard probabilities that look like this.

470643 c12.qxd 3/8/04 11:17 AM Page 399

Hazard Functions and Survival Analysis in Marketing 399

The second peak in the “M” is coincident with the end of the initial promotion that offers introductory pricing. This promo typically lasts for about 3 months, and then customers have to start paying full price. Many decide that they no longer really want the service. It is quite possible that many of these customers reappear to take advantage of other promotions, an interesting fact not germane to this discussion on hazards but relevant to the business.

After the first 3 months, the hazard function has no more really high peaks.

There is a small cycle of peaks, about every 4 or 5 weeks. This corresponds to the monthly billing cycle. Customers are more likely to stop just after they receive a bill.

The chart also shows that there is a gentle decline in the hazard rate. This decline is a good thing, since it means that the longer a customers stays around, the less likely the customer is to leave. Another way of saying this is that customers are becoming more loyal the longer they stay with the company.

Censoring

So far, this introduction to hazards has glossed over one of the most important concepts in survival analysis: censoring. Remember the definition of a hazard probability, the number of stops at a given time t divided by the population at that time. Clearly, if a customer has stopped before time t, then that customer is not included in the population count. This is most basic example of censoring.

Customers who have stopped are not included in calculations after they stop.

There is another example of censoring, although it is a bit subtler. Consider customers whose tenure is t but who are currently active. These customers are not included in the population for the hazard for tenure t, because the customers might still stop before t+1—here today, gone tomorrow. These customers have been dropped out of the calculation for that particular hazard, although they are included in calculations of hazards for smaller values of t. Censoring—dropping some customers from some of the hazard calculations—proves to be a very powerful technique, important to much of survival analysis.

Let’s look at this with a picture. Figure 12.7 shows a set of customers and what happens at the beginning and end of their relationship. In particular, the end is shown with a small circle that is either open or closed. When the circle is open, the customer has already left and their exact tenure is known since the stop date is known.

A closed circle means that the customer has survived to the analysis date, so the stop date is not yet known. This customer—or in particular, this customer’s tenure—is censored. The tenure is at least the current tenure, but most likely larger. How much larger is unknown, because that customer’s exact stop date has not yet happened.

470643 c12.qxd 3/8/04 11:17 AM Page 400

400 Chapter 12

time

Figure 12.7 In this group of customers who all start at different times, some customers are censored because they are still active.

Let’s walk through the hazard calculation for these customers, paying particular attention to the role of censoring. When looking at customer data for hazard calculations, both the tenure and the censoring flag are needed. For the customers in Figure 12.7, Table 12.2 shows this data.

It is instructive to see what is happening during each time period. At any point in time, a customer might be in one of three states: ACTIVE, meaning that the relationship is still ongoing; STOPPED, meaning that the customer stopped during that time interval; or CENSORED, meaning that the customer is not included in the calculation. Table 12.3 shows what happens to the customers during each time period.

Table 12.2 Tenure Data for Several Customers

CUSTOMER CENSORED TENURE5

2

N

4

3

N

3

4

Y

3

5

N

2

6

Y

1

7

N

1

470643 c12.qxd 3/8/04 11:17 AM Page 401

Hazard Functions and Survival Analysis in Marketing 401

ED

ED

ED

ED

ED

ED

5 E

SOR

SOR

SOR

SOR

SOR

SOR

IM

CTIVE

T

A

CEN

CEN

CEN

CEN

CEN

CEN

ED

ED

ED

ED

ED

4

ED

E

P

SOR

SOR

SOR

SOR

SOR

IM

CTIVE

T

A

STOP

CEN

CEN

CEN

CEN

CEN

ED

ED

ED

3

ED

E

P

SOR

SOR

SOR

IM

CTIVE

CTIVE

CTIVE

T

A

A

STOP

A

CEN

CEN

CEN

ED

ED

2

ED

E

P

SOR

SOR

IM

CTIVE

CTIVE

CTIVE

CTIVE

T

A

A

A

A

STOP

CEN

CEN

1

ED

E

P

IM

CTIVE

CTIVE

CTIVE

CTIVE

CTIVE

CTIVE

T

A

A

A

A

A

A

STOP

0 E

IM

CTIVE

CTIVE

CTIVE

CTIVE

CTIVE

CTIVE

CTIVE

T

A

A

A

A

A

A

A

eriods

E

ime P

IMT

IFEL

5

4

3

3

2

1

1

ORED

CENS

Y

N

N

Y

N

Y

N

Tracking Customers over Several T

ER

2.3

OMT

CUS

1

2

3

4

5

6

7

Table 1

470643 c12.qxd 3/8/04 11:17 AM Page 402

402 Chapter 12

Table 12.4 From Times to Hazards

TIME 0

TIME 1

TIME 2

TIME 3

TIME 4

TIME 5

ACTIVE

7

6

4

3

1

1

STOPPED

0

1

1

1

1

0

CENSORED

0

0

2

3

5

5

HAZARD

0%

14%

20%

25%

50%

0%

Notice in Table 12.4 that the censoring takes place one time unit later than the lifetime. That is, Customer #1 survived to Time 5, what happens after that is unknown. The hazard at a given time is the number of customers who are STOPPED divided by the total of the customers who are either ACTIVE or STOPPED.

The hazard for Time 1 is 14 percent, since one out of seven customers stop at this time. All seven customers survived to time 1 and all could have stopped.

Of these, only one did. At TIME 2, there are five customers left—Customer #7

has already stopped, and Customer #6 has been censored. Of these five, one stops, for a hazard of 20 percent. And so on. This example has shown how to calculate hazard functions, taking into account the fact that some (hopefully many) customers have not yet stopped.

TEAMFLY

This calculation also shows that the hazards are highly erratic—jumping from 25 percent to 50 percent to 0 percent in the last 3 days. Typically, hazards do not vary so much. This erratic behavior arises only because there are so few customers in this simple example. Similarly, lining up customers in a table is useful for didactic purposes to demonstrate the calculation on a manageable set of data. In the real world, such a presentation is not feasible, since there are likely to be thousands or millions of customers going down and hundreds or thousands of days going across.

It is also worth mentioning that this treatment of hazards introduces them as conditional probabilities, which vary between 0 and 1. This is possible because the hazards are using time that is in discrete units, such as days or week, a description of time applicable to customer-related analyses. However, statisticians often work with hazard rates rather than probabilities. The ideas are clearly very related, but the mathematics using rates involves daunting integrals, complicated exponential functions, and difficult to explain adjustments to this or that factor. For our purposes, the simpler hazard probabilities are not only easier to explain, but they also solve the problems that arise when working with customer data.

Other Types of Censoring

The previous section introduced censoring in two cases: hazards for customers after they have stopped and hazards for customers who are still active. There Team-Fly®

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *