Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 99 – Library. Read online. Free books read online. Read books without registering

470643 c12.qxd 3/8/04 11:17 AM Page 403

Hazard Functions and Survival Analysis in Marketing 403

are other useful cases as well. To explain other types of censoring, it is useful to go back to the medical realm.

Imagine that you are a cancer researcher and have found a medicine that cures cancer. You have to run a study to verify that this fabulous new treatment works. Such studies typically follow a group of patients for several years after the treatment, say 5 years. For the purposes of this example, we only want to know if patients die from cancer during the course of the study (medical researchers have other concerns as well, such as the recurrence of the disease, but that does not concern us in this simplified example).

So you identify 100 patients, give them the treatment, and their cancers seem to be cured. You follow them for several years. During this time, seven patients celebrate their newfound health by visiting Iceland. In a horrible tragedy, all seven happen to die in an avalanche caused by a submerged volcano. What is the effectiveness of your treatment on cancer mortality? Just looking at the data, it is tempting to say there is a 7 percent mortality rate.

However, this mortality is clearly not related to the treatment, so the answer does not feel right.

And, in fact, the answer is not right. This is an example of competing risks. A study participant might live, might die of cancer, or might die of a mountain climbing accident on a distant island. Or the patient might move to Tahiti and drop out of the study. As medical researchers say, such a patient has been “lost to follow-up.”

The solution is to censor the patients who exit the study before the event being studied occurs. If patients drop out of the study, then they were healthy to the point in time when they dropped out, and the information acquired during this period can be used to calculate hazards. Afterward there is no way of knowing what happened. They are censored at the point when they exit. If a patient dies of something else, then he or she is censored at the point when death occurs, and the death is not included in the hazard calculation.

T I P The right way to deal with competing risks is to develop different sets of hazards for each risk, where the other risks are censored.

Competing risks are familiar in the business environment as well. For instance, there are often two types of stops: voluntary stops, when a customer decides to leave, and involuntary stops, when the company decides a customer should leave—often due to unpaid bills In doing an analysis on voluntary churn, what happens to customers who are forced to discontinue their relationships due to unpaid bills? If such a customer were forced to stop on day 100, then that customer did not stop voluntarily on days 1–99. This information can be used to generate hazards for voluntary stops. However, starting on day 100, the customer is censored, as shown in Figure 12.8. Censoring customers, even when they have stopped for other reasons, makes it possible to understand different types of stops.

470643 c12.qxd 3/8/04 11:17 AM Page 404

404 Chapter 12

These two customers were forced to

leave, so they are censored at the

point of attrition instead of being

considered stopped.

All the data from before they left is

included in the calculation of the

hazard functions for voluntary

attrition — since this they remained

as customers before then.

time

Figure 12.8 Using censoring makes it possible to develop hazard models for voluntary attrition that include customers who were forced to leave.

From Hazards to Survival

This chapter started with a discussion of retention curves. From the hazard functions, it is possible to create a very similar curve, called the survival curve.

The survival curve is more useful and in many senses more accurate.

Retention

A retention curve provides information about how many customers have been retained for a certain amount of time. One common way of creating a retention curve is to do the following:

■■

For customers who started 1 week ago, measure the 1-week retention.

■■

For customers who started 2 weeks ago, measure the 2-week retention.

■■

And so on.

Figure 12.9 shows an example of a retention curve based on this approach.

The overall shape of this curve looks appropriate. However, the curve itself is quite jagged. It seems odd, for instance, that 10-week retention would be better than 9-week retention, as suggested by this data.

470643 c12.qxd 3/8/04 11:17 AM Page 405

Hazard Functions and Survival Analysis in Marketing 405

100%

90%

80%

70%

60%

50%

40%

Retention

30%

20%

10%

100

1 10

Tenure (Weeks)

Figure 12.9 A retention curve might be quite jagged.

Actually, it is more than odd, it violates the very idea notion of retention. For instance, it opens the possibility that the curve will cross the 50 percent threshold more than once, leading to the odd, and inaccurate, conclusion that there is more than one median lifetime, or that the average retention for customers during the first 10 weeks after they start might be more than the average for the first 9 weeks. What is happening? Are customers being reincarnated?

These problems are an artifact of the way the curve was created. Customers acquired in any given time period may be better or worse than the customers acquired in other time periods. For instance, perhaps 9 weeks ago there was a special pricing offer that brought in bad customers. Customers who started 10 weeks ago were the usual mix of good and bad, but those who started 9 weeks ago were particularly bad. So, there are fewer of the bad customers after 9 weeks than of the better customers after 10 weeks.

The quality of customers might also vary due merely to random variation. After all, in the previous figure, there are over 100 time periods being considered—so, all things being equal, some time periods would be expected to exhibit differences.

A compounding reason is that marketing efforts change over time, attracting different qualities of customers. For instance, customers arriving by different channels often have different retention characteristics, and the mix of customers from different channels is likely to change over time.

Survival

Hazards give the probability that a customer might stop at a particular point in time. Survival, on the other hand, gives the probability of a customer surviving up to that time. Survival values are calculated directly from the hazards.

470643 c12.qxd 3/8/04 11:17 AM Page 406

406 Chapter 12

At any point in time, the chance that a customer survives to the next unit of time is simply 1 – hazard, which is called conditional survival at time t (it is conditional because it assumes that the customers survived up to time t).

Calculating the full survival at a given time requires accumulating all the conditional survivals up to that point in time by multiplying them together. The survival value starts at 1 (or 100 percent) at time 0, since all customers included in analysis survive to the beginning of the analysis.

Since the hazard is always between 0 and 1, the conditional survival is also between 0 and 1. Hence, survival itself is always getting smaller—because each successive value is being multiplied by a number less than 1. The survival curve itself starts at 1, gently goes down, sometimes flattening, perhaps, out but never rising up.

Survival curves make more sense for customer retention purposes than the retention curves described earlier. Figure 12.10 shows a survival curve and its corresponding retention curve. It is clear that the survival curve is smoother, and that it slopes downward at all times. The retention curve bounces all over the place.

The differences between the retention curve and the survival curve may, at first, seem nonintuitive. The retention curve is actually pasting together a whole bunch of different pictures of customers from the past, like a photo collage pieced together from a bunch of different photographs to get a panoramic image. In the collage, the picture in each photo is quite clear. However, the boundaries do not necessarily fit together smoothly. Different pictures in the collage look different, because of differences in lighting or perspective—

differences that contribute to the aesthetic of the collage.

100%

90%

80%

70%

viv

60%

50%

40%

30%

Retention/Sur

20%

10%

100

110

Tenure (Weeks)

Figure 12.10 A survival curve is smoother than a retention curve.

470643 c12.qxd 3/8/04 11:17 AM Page 407

Hazard Functions and Survival Analysis in Marketing 407

The same thing is happening with retention curves, where customers who start at different points in time have different perspectives. Any given point on the retention curve is close to the actual retention value; however, taken as a whole, it looks jagged. One way to remove the jaggedness is to focus on customers who start at about the same time, as suggested earlier in this chapter.