This is not a purely theoretical concern. A large bank, for instance, did a direct mail campaign to encourage customers to open investment accounts.
Their analytic group developed a model for response for the mailing. They went ahead and tested the campaign, using three groups:
■■
Control group: A group chosen at random to receive the mailing.
■■
Test group: A group chosen by modeled response scores to receive the mailing.
■■
Holdout group: A group chosen by model scores who did not receive the mailing.
The models did quite well. That is, the customers who had high model scores did indeed respond at a higher rate than the control group and customers with lower scores. However, customers in the holdout group also responded at the same rate as customers in the test group.
What was happening? The model worked correctly to identify people interested in such accounts. However, every part of the bank was focused on getting customers to open investment accounts—broadcast advertising, posters in branches, messages on the Web, training for customer service staff. The direct mail was drowned in the noise from all the other channels, and turned out to be unnecessary.
T I P To test whether both a model and the campaign it supports are effective, track the relationship of response rate to model score among prospects in a holdout group who are not part of the campaign as well as among prospects who are included in the campaign.
The goal of a marketing campaign is to change behavior. In this regard, reaching a prospect who is going to purchase anyway is little more effective than reaching a prospect who will not purchase despite having received the offer. A group identified as likely responders may also be less likely to be influenced by a marketing message. Their membership in the target group means that they are likely to have been exposed to many similar messages in the past from competitors. They are likely to already have the product or a close substitute or to be firmly entrenched in their refusal to purchase it. A marketing message may make more of a difference with people who have not heard it all
470643 c04.qxd 3/8/04 11:10 AM Page 107
Data Mining Applications 107
before. Segments with the highest scores might have responded anyway, even without the marketing investment. This leads to the almost paradoxical conclusion that the segments with the highest scores in a response model may not provide the biggest return on a marketing investment.
Differential Response Analysis
The way out of this dilemma is to directly model the actual goal of the campaign, which is not simply reaching prospects who then make purchases. The goal should be reaching prospects who are more likely to make purchases because of having been contacted. This is known as differential response analysis.
Differential response analysis starts with a treated group and a control group. If the treatment has the desired effect, overall response will be higher in the treated group than in the control group. The object of differential response analysis is to find segments where the difference in response between the treated and untreated groups is greatest. Quadstone’s marketing analysis software has a module that performs this differential response analysis (which they call “uplift analysis”) using a slightly modified decision tree as illustrated in Figure 4.5.
The tree in the illustration is based on the response data from a test mailing, shown in Table 4.5. The data tabulates the take-up rate by age and sex for an advertised service for a treated group that received a mailing and a control group that did not.
It doesn’t take much data mining to see that the group with the highest response rate is young men who received the mailing, followed by old men who received the mailing. Does that mean that a campaign for this service should be aimed primarily at men? Not if the goal is to maximize the number of new customers who would not have signed up without prompting. Men included in the campaign do sign up for the service in greater numbers than women, but men are more likely to purchase the service in any case. The differential response tree makes it clear that the group most affected by the campaign is old women. This group is not at all likely (0.4 percent) to purchase the service without prompting, but with prompting they experience a more than tenfold increase in purchasing.
Table 4.5 Response Data from a Test Mailing
CONTROL GROUP
TREATED (MAILED TO) GROUP
YOUNG
OLD
YOUNG
OLD
women
0.8%
0.4%
4.1% (↑3.3)
4.6% (↑4.2)
men
2.8%
3.3%
6.2% (↑3.4)
5.2% (↑1.9)
470643 c04.qxd 3/8/04 11:10 AM Page 108
108 Chapter 4
Treated
Difference in response
Objective: Respond
Group
between the groups
Uplift = +3.2% of 49,873
Control
& 50,127
Group
#0
Female
Sex
Male
+3.8% of 25,100
+2.6% of 24,773
& 25,215
& 24,912
#1
#2
Treated
Age
Age
Group
Young
Old
Young
Old
+3.3% of 12,353
+4.2% of 12,747
3.4% of 12,321
1.9% of 12,452
& 12,379
& 12,836
& 12,158
& 12,754
#3
#4
#5
#6
Control
Difference in response
Group
between the groups
Figure 4.5 Quadstone’s differential response tree tries to maximize the difference in response between the treated group and a control group.
Using Current Customers to Learn About Prospects
A good way to find good prospects is to look in the same places that today’s best customers came from. That means having some of way of determining who the best customers are today. It also means keeping a record of how current customers were acquired and what they looked like at the time of acquisition.
Of course, the danger of relying on current customers to learn where to look for prospects is that the current customers reflect past marketing decisions.
Studying current customers will not suggest looking for new prospects anyplace that hasn’t already been tried. Nevertheless, the performance of current customers is a great way to evaluate the existing acquisition channels. For prospecting purposes, it is important to know what current customers looked like back when they were prospects themselves. Ideally you should:
■■ Start tracking customers before they become customers.
■■ Gather information from new customers at the time they are acquired.
■■ Model the relationship between acquisition-time data and future outcomes of interest.
The following sections provide some elaboration.
470643 c04.qxd 3/8/04 11:10 AM Page 109
Data Mining Applications 109
Start Tracking Customers before
They Become Customers
It is a good idea to start recording information about prospects even before they become customers. Web sites can accomplish this by issuing a cookie each time a visitor is seen for the first time and starting an anonymous profile that remembers what the visitor did. When the visitor returns (using the same browser on the same computer), the cookie is recognized and the profile is updated. When the visitor eventually becomes a customer or registered user, the activity that led up to that transition becomes part of the customer record.
Tracking responses and responders is good practice in the offline world as well. The first critical piece of information to record is the fact that the prospect responded at all. Data describing who responded and who did not is a necessary ingredient of future response models. Whenever possible, the response data should also include the marketing action that stimulated the response, the channel through which the response was captured, and when the response came in.
Determining which of many marketing messages stimulated the response can be tricky. In some cases, it may not even be possible. To make the job easier, response forms and catalogs include identifying codes. Web site visits capture the referring link. Even advertising campaigns can be distinguished by using different telephone numbers, post office boxes, or Web addresses.
Depending on the nature of the product or service, responders may be required to provide additional information on an application or enrollment form. If the service involves an extension of credit, credit bureau information may be requested. Information collected at the beginning of the customer relationship ranges from nothing at all to the complete medical examination sometimes required for a life insurance policy. Most companies are somewhere in between.
Gather Information from New Customers
When a prospect first becomes a customer, there is a golden opportunity to gather more information. Before the transformation from prospect to customer, any data about prospects tends to be geographic and demographic.
Purchased lists are unlikely to provide anything beyond name, contact information, and list source. When an address is available, it is possible to infer other things about prospects based on characteristics of their neighborhoods.
Name and address together can be used to purchase household-level information about prospects from providers of marketing data. This sort of data is useful for targeting broad, general segments such as “young mothers” or “urban teenagers” but is not detailed enough to form the basis of an individualized customer relationship.
470643 c04.qxd 3/8/04 11:10 AM Page 110