Of course, the profit model is only as good as its inputs. While the fixed and variable costs of the campaign are fairly easy to come by, the predicted value of a responder can be harder to estimate. The process of figuring out what a customer is worth is beyond the scope of this book, but a good estimate helps to measure the true value of a data mining model.
In the end, the measure that counts the most is return on investment. Measuring lift on a test set helps choose the right model. Profitability models based on lift will help decide how to apply the results of the model. But, it is very important to measure these things in the field as well. In a database marketing application, this requires always setting aside control groups and carefully tracking customer response according to various model scores.
Step Eleven: Begin Again
Every data mining project raises more questions than it answers. This is a good thing. It means that new relationships are now visible that were not visible
470643 c03.qxd 3/8/04 11:09 AM Page 86
86
Chapter 3
before. The newly discovered relationships suggest new hypotheses to test and the data mining process begins all over again.
Lessons Learned
Data mining comes in two forms. Directed data mining involves searching through historical records to find patterns that explain a particular outcome.
Directed data mining includes the tasks of classification, estimation, prediction, and profiling. Undirected data mining searches through the same records for interesting patterns. It includes the tasks of clustering, finding association rules, and description.
Data mining brings the business closer to data. As such, hypothesis testing is a very important part of the process. However, the primary lesson of this chapter is that data mining is full of traps for the unwary and following a methodology based on experience can help avoid them.
The first hurdle is translating the business problem into one of the six tasks that can be solved by data mining: classification, estimation, prediction, affinity grouping, clustering, and profiling.
The next challenge is to locate appropriate data that can be transformed into actionable information. Once the data has been located, it should be thoroughly explored. The exploration process is likely to reveal problems with the data. It will also help build up the data miner’s intuitive understanding of the data.
The next step is to create a model set and partition it into training, validation, and test sets.
Data transformations are necessary for two purposes: to fix problems with the data such as missing values and categorical variables that take on too many values, and to bring information to the surface by creating new variables to represent trends and other ratios and combinations.
Once the data has been prepared, building models is a relatively easy process. Each type of model has its own metrics by which it can be assessed, but there are also assessment tools that are independent of the type of model.
Some of the most important of these are the lift chart, which shows how the model has increased the concentration of the desired value of the target variable and the confusion matrix that shows that misclassification error rate for each of the target classes. The next chapter uses examples from real data mining projects to show the methodology in action.
470643 c04.qxd 3/8/04 11:10 AM Page 87
C H A P T E R
4
Data Mining Applications in
Marketing and Customer
Relationship Management
Some people find data mining techniques interesting from a technical perspective. However, for most people, the techniques are interesting as a means to an end. The techniques do not exist in a vacuum; they exist in a business context. This chapter is about the business context.
This chapter is organized around a set of business objectives that can be addressed by data mining. Each of the selected business objectives is linked to specific data mining techniques appropriate for addressing the problem. The business topics addressed in this chapter are presented in roughly ascending order of complexity of the customer relationship. The chapter starts with the problem of communicating with potential customers about whom little is known, and works up to the varied data mining opportunities presented by ongoing customer relationships that may involve multiple products, multiple communications channels, and increasingly individualized interactions.
In the course of discussing the business applications, technical material is introduced as appropriate, but the details of specific data mining techniques are left for later chapters.
Prospecting
Prospecting seems an excellent place to begin a discussion of business applications of data mining. After all, the primary definition of the verb to prospect 87
470643 c04.qxd 3/8/04 11:10 AM Page 88
88
Chapter 4
comes from traditional mining, where it means to explore for mineral deposits or oil. As a noun, a prospect is something with possibilities, evoking images of oil fields to be pumped and mineral deposits to be mined. In marketing, a prospect is someone who might reasonably be expected to become a customer if approached in the right way. Both noun and verb resonate with the idea of using data mining to achieve the business goal of locating people who will be valuable customers in the future.
For most businesses, relatively few of Earth’s more than six billion people are actually prospects. Most can be excluded based on geography, age, ability to pay, and need for the product or service. For example, a bank offering home equity lines of credit would naturally restrict a mailing offering this type of loan to homeowners who reside in jurisdictions where the bank is licensed to operate. A company selling backyard swing sets would like to send its catalog to households with children at addresses that seem likely to have backyards. A magazine wants to target people who read the appropriate language and will be of interest to its advertisers. And so on.
Data mining can play many roles in prospecting. The most important of these are:
■■
Identifying good prospects
■■
Choosing a communication channel for reaching prospects
■■
Picking appropriate messages for different groups of prospects
Although all of these are important, the first—identifying good prospects—
is the most widely implemented.
Identifying Good Prospects
The simplest definition of a good prospect—and the one used by many companies—is simply someone who might at least express interest in becoming a customer. More sophisticated definitions are more choosey. Truly good prospects are not only interested in becoming customers; they can afford to become customers, they will be profitable to have as customers, they are unlikely to defraud the company and likely to pay their bills, and, if treated well, they will be loyal customers and recommend others. No matter how simple or sophisticated the definition of a prospect, the first task is to target them.
Targeting is important whether the message is to be conveyed through advertising or through more direct channels such as mailings, telephone calls, or email. Even messages on billboards are targeted to some degree; billboards for airlines and rental car companies tend to be found next to highways that lead to airports where people who use these services are likely to be among those driving by.
470643 c04.qxd 3/8/04 11:10 AM Page 89
Data Mining Applications
89
Data mining is applied to this problem by first defining what it means to be a good prospect and then finding rules that allow people with those characteristics to be targeted. For many companies, the first step toward using data mining to identify good prospects is building a response model. Later in this chapter is an extended discussion of response models, the various ways they are employed, and what they can and cannot do.
Choosing a Communication Channel
Prospecting requires communication. Broadly speaking, companies intentionally communicate with prospects in several ways. One way is through public relations, which refers to encouraging media to cover stories about the company and spreading positive messages by word of mouth. Although highly effective for some companies (such as Starbucks and Tupperware), public relations are not directed marketing messages.
Of more interest to us are advertising and direct marketing. Advertising can mean anything from matchbook covers to the annoying pop-ups on some commercial Web sites to television spots during major sporting events to product placements in movies. In this context, advertising targets groups of people based on common traits; however, advertising does not make it possible to customize messages to individuals. A later section discusses choosing the right place to advertise, by matching the profile of a geographic area to the profile of prospects.
Direct marketing does allow customization of messages for individuals.
This might mean outbound telephone calls, email, postcards, or glossy color catalogs. Later in the chapter is a section on differential response analysis, which explains how data mining can help determine which channels have been effective for which groups of prospects.
Picking Appropriate Messages
Even when selling the same basic product or service, different messages are appropriate for different people. For example, the same newspaper may appeal to some readers primarily for its sports coverage and to others primarily for its coverage of politics or the arts. When the product itself comes in many variants, or when there are multiple products on offer, picking the right message is even more important.