How does this company—or any similar large corporation—manage its billing process, the bread and butter of its business, responsible for the majority of its revenue? The answer is simple: Very carefully! Companies have developed detailed processes for handling standard operations; they have policies and procedures. These processes are robust. Bills go out to customers, even when the business reorganizes, even when database administrators are on vacation, even when computers are temporarily down, even as laws and regulations change, and switches are upgraded. If an organization can manage a process as complicated as getting accurate bills out every month to millions of residential, business, and government customers, surely incorporating data mining into decision processes should be fairly easy. Is this the case?
Large corporations have decades of experience developing and implementing mission-critical applications for running their business. Data mining is different from the typical operational system (see Table 2.1). The skills needed for running a successful operational system do not necessarily lead to successful data mining efforts.
Team-Fly®
470643 c02.qxd 3/8/04 11:09 AM Page 33
The Virtuous Cycle of Data Mining
33
Table 2.1 Data Mining Differs from Typical Operational Business Processes TYPICAL OPERATIONAL SYSTEM
DATA MINING SYSTEM
Operations and reports on
Analysis on historical data often
historical data
applied to most current data to
determine future actions
Predictable and periodic flow of
Unpredictable flow of work
work, typically tied to calendar
depending on business and
marketing needs
Limited use of enterprise-wide data
The more data, the better the results
(generally)
Focus on line of business (such as
Focus on actionable entity, such as
account, region, product code, minutes
product, customer, sales region
of use, and so on), not on customer
Response times often measured in
Iterative processes with response
seconds/milliseconds (for interactive
times often measured in minutes or
systems) while waiting weeks/months
hours
for reports
System of record for data
Copy of data
Descriptive and repetitive
Creative
First, problems being addressed by data mining differ from operational problems— a data mining system does not seek to replicate previous results exactly.
In fact, replication of previous efforts can lead to disastrous results. It may result in marketing campaigns that market to the same people over and over.
You do not want to learn from analyzing data that a large cluster of customers fits the profile of the customers contacted in some previous campaign. Data mining processes need to take such issues into account, unlike typical operational systems that want to reproduce the same results over and over—
whether completing a telephone call, sending a bill, authorizing a credit purchase, tracking inventory, or other countless daily operations.
Data mining is a creative process. Data contains many obvious correlations that are either useless or simply represent current business policies. For example, analysis of data from one large retailer revealed that people who buy maintenance contracts are also very likely to buy large household appliances.
Unless the retailer wanted to analyze the effectiveness of sales of maintenance contracts with appliances, such information is worse than useless—the maintenance contracts in question are only sold with large appliances. Spending millions of dollars on hardware, software, and analysts to find such results is a waste of resources that can better be applied elsewhere in the business. Analysts need to understand what is of value to the business and how to arrange the data to bring out the nuggets.
470643 c02.qxd 3/8/04 11:09 AM Page 34
34
Chapter 2
Data mining results change over time. Models expire and become less useful as time goes on. One cause is that data ages quickly. Markets and customers change quickly as well.
Data mining provides feedback into other processes that may need to change.
Decisions made in the business world often affect current processes and interactions with customers. Often, looking at data finds imperfections in operational systems, imperfections that should be fixed to enhance future customer understanding.
The rest of this chapter looks at some more examples of the virtuous cycle of data mining in action.
A Wireless Communications Company
Makes the Right Connections
The wireless communications industry is fiercely competitive. Wireless phone companies are constantly dreaming up new ways to steal customers from their competitors and to keep their own customers loyal. The basic service offering is a commodity, with thin margins and little basis for product differentiation, so phone companies think of novel ways to attract new customers.
This case study talks about how one mobile phone provider used data mining to improve its ability to recognize customers who would be attracted to a new service offering. (We are indebted to Alan Parker of Apower Solutions for many details in this study.)
The Opportunity
This company wanted to test market a new product. For technical reasons, their preliminary roll-out tested the product on a few hundred subscribers —a tiny fraction of the customer base in the chosen market.
The initial problem, therefore, was to figure out who was likely to be interested in this new offering. This is a classic application of data mining: finding the most cost-effective way to reach the desired number of responders. Since fixed costs of a direct marketing campaign are constant by definition, and the cost per contact is also fairly constant, the only practical way to reduce the total cost of the campaign is to reduce the number of contacts.
The company needed a certain number of people to sign up in order for the trial to be valid. The company’s past experience with new-product introduction campaigns was that about 2 to 3 percent of existing customers would respond favorably. So, to reach 500 responders, they would expect to contact between about 16,000 and 25,000 prospects.
470643 c02.qxd 3/8/04 11:09 AM Page 35
The Virtuous Cycle of Data Mining
35
How should the targets be selected? It would be handy to give each prospective customer a score from, say, 1 to 100, where 1 means “is very likely to purchase the product” and 100 means “very unlikely to purchase the product.”
The prospects could then be sorted according to this score, and marketing could work down this list until reaching the desired number of responders. As the cumulative gains chart in Figure 2.3 illustrates, contacting the people most likely to respond achieves the quota of responders with fewer contacts, and hence at a lower cost.
The next chapter explains cumulative gains charts in more detail. For now, it is enough to know that the curved line is obtained by ordering the scored prospects along the X-axis with those judged most likely to respond on the left and those judged least likely on the right. The diagonal line shows what would happen if prospects were selected at random from all prospects. The chart shows that good response scores lower the cost of a direct marketing campaign by allowing fewer prospects to be contacted.
How did the mobile phone company get such scores? By data mining, of course!
How Data Mining Was Applied
Most data mining methods learn by example. The neural network or decision tree generator or what have you is fed thousands and thousands of training examples. Each of the training examples is clearly marked as being either a responder or a nonresponder. After seeing enough of these examples, the tool comes up with a model in the form of a computer program that reads in unclassified records and updates each with a response score or classification.
In this case, the offer in question was a new product introduction, so there was no training set of people who had already responded. One possibility would be to build a model based on people who had ever responded to any offer in the past. Such a model would be good for discriminating between people who refuse all telemarketing calls and throw out all junk mail, and those who occasionally respond to some offers. These types of models are called nonresponse models and can be valuable to mass mailers who really do want their message to reach a large, broad market. The AARP, a non-profit organization that provides services to retired people, saved millions of dollars in mailing costs when it began using a nonresponse model. Instead of mailing to every household with a member over 50 years of age, as they once did, they discard the bottom 10 percent and still get almost all the responders they would have.
However, the wireless company only wanted to reach a few hundred responders, so a model that identified the top 90 percent would not have served the purpose. Instead, they formed a training set of records from a similar new product introduction in another market.
470643 c02.qxd 3/8/04 11:09 AM Page 36
36
Chapter 2
20,000
18,000
16,000
14,000
Savings
12,000
Contacts
8,000
6,000
g
n i l
Randomized Mass Mailing
4,000
Mai
d e t e g r a
Quota
T
2,000
Responses
400
300
200
100
Figure 2.3 Ranking prospects, using a response model, makes it possible to save money by targeting fewer customers and getting the same number of responders.