470643 c02.qxd 3/8/04 11:09 AM Page 37
The Virtuous Cycle of Data Mining
37
Defining the Inputs
The data mining techniques described in this book automate the central core of the model building process. Given a collection of input data fields, and a target field (in this case, purchase of the new product) they can find patterns and rules that explain the target in terms of the inputs. For data mining to succeed, there must be some relationship between the input variables and the target.
In practice, this means that it often takes much more time and effort to identify, locate, and prepare input data than it does to create and run the models, especially since data mining tools make it so easy to create models. It is impossible to do a good job of selecting input variables without knowledge of the business problem being addressed. This is true even when using data mining tools that claim the ability to accept all the data and figure out automatically which fields are important. Information that knowledgeable people in the industry expect to be important is often not represented in raw input data in a way data mining tools can recognize.
The wireless phone company understood the importance of selecting the right input data. Experts from several different functional areas including marketing, sales, and customer support met together with outside data mining consultants to brainstorm about the best way to make use of available data.
There were three data sources available:
A marketing customer information file
A call detail database
A demographic database
The call detail database was the largest of the three by far. It contained a record for each call made or received by every customer in the target market.
The marketing database contained summarized customer data on usage, tenure, product history, price plans, and payment history. The third database contained purchased demographic and lifestyle data about the customers.
Derived Inputs
As a result of the brainstorming meetings and preliminary analysis, several summary and descriptive fields were added to the customer data to be used as input to the predictive model:
Minutes of use
Number of incoming calls
Frequency of calls
Sphere of influence
Voice mail user flag
470643 c02.qxd 3/8/04 11:09 AM Page 38
38
Chapter 2
Some of these fields require a bit of explanation. Minutes of use (MOU) is a standard measure of how good a customer is. The more minutes of use, the better the customer. Historically, the company had focused on MOU almost to the exclusion of all other variables. But, MOU masks many interesting differences: 2 long calls or 100 short ones? All outgoing calls or half incoming? All calls to the same number or calls to many numbers? The next items in the above list are intended to shed more light on these questions.
Sphere of influence (SOI) is another interesting measure because it was developed as a result of an earlier data mining effort. A customer’s SOI is the number of people with whom she or he had phone conversations during a given time period. It turned out that high SOI customers behaved differently, as a group, than low SOI customers in several ways including frequency of calls to customer service and loyalty.
The Actions
Data from all three sources was brought together and used to create a data mining model. The model was used to identify likely candidates for the new product. Two direct mailings were made: one to a list based on the results of the data mining model and one to control group selected using business-as-usual methods. As shown in Figure 2.4, 15 percent of the people in the target group purchased the new product, compared to only 3 percent in the control group.
15%
3%
Percent of Target Market Responding
Percent of Control Group Responding
Figure 2.4 These results demonstrate a very successful application of data mining.
470643 c02.qxd 3/8/04 11:09 AM Page 39
The Virtuous Cycle of Data Mining
39
Completing the Cycle
With the help of data mining, the right group of prospects was contacted for the new product offering. That is not the end of the story, though. Once the results of the new campaign were in, data mining techniques could help to get a better picture of the actual responders. Armed with a buyer profile of the buyers in the initial test market, and a usage profile of the first several months of the new service, the company was able to do an even better job of targeting prospects in the next five markets where the product was rolled out.
Neural Networks and Decision Trees Drive SUV Sales
In 1992, before any of the commercial data mining tools available today were on the market, one of the big three U.S. auto makers asked a group of researchers at the Pontikes Center for Management at Southern Illinois University in Carbondale to develop an “expert system” to identify likely buyers of a particular sport-utility vehicle. (We are grateful to Wei-Xiong Ho who worked with Joseph Harder of the College of Business and Administration at Southern Illinois on this project.)
Traditional expert systems consist of a large database of hundreds or thousands of rules collected by observing and interviewing human experts who are skilled at a particular task. Expert systems have enjoyed some success in certain domains such as medical diagnosis and answering tax questions, but the difficulty of collecting the rules has limited their use.
The team at Southern Illinois decided to solve these problems by generating the rules directly from historical data. In other words, they would replace expert interviews with data mining.
The Initial Challenge
The initial challenge that Detroit brought to Carbondale was to improve response to a direct mail campaign for a particular model. The campaign involved sending an invitation to a prospect to come test-drive the new model.
Anyone accepting the invitation would find a free pair of sunglasses waiting at the dealership. The problem was that very few people were returning the response card or calling the toll-free number for more information, and few of those that did ended up buying the vehicle. The company knew it could save itself a lot of money by not sending the offer to people unlikely to respond, but it didn’t know who those were.
470643 c02.qxd 3/8/04 11:09 AM Page 40
40
Chapter 2
How Data Mining Was Applied
As is often the case when the data to be mined is from several different sources, the first challenge was to integrate data so that it could tell a consistent story.
The Data
The first file, the “mail file,” was a mailing list containing names and addresses of about a million people who had been sent the promotional mailing. This file contained very little information likely to be useful for selection.
The mail file was appended with data based on zip codes from the commercially available PRIZM database. This database contains demographic and
“psychographic” characterizations of the neighborhoods associated with the zip codes.
Two additional files contained information on people who had sent back the response card or called the toll-free number for more information. Linking the response cards back to the original mailing file was simple because the mail file contained a nine-character key for each address that was printed on the response cards. Telephone responders presented more of a problem since their reported name and address might not exactly match their address in the database, and there is no guarantee that the call even came from someone on the mailing list since the recipient may have passed the offer on to someone else.
Of 1,000,003 people who were sent the mailing, 32,904 responded by sending back a card and 16,453 responded by calling the toll-free number for a total initial response rate of 5 percent. The auto maker’s primary interest, of course, was in the much smaller number of people who both responded to the mailing and bought the advertised car. These were to be found in a sales file, obtained from the manufacturer, that contained the names, addresses, and model purchased for all car buyers in the 3-month period following the mailing.
An automated name-matching program with loosely set matching standards discovered around 22,000 apparent matches between people who bought cars and people who had received the mailing. Hand editing reduced the intersection to 4,764 people who had received the mailing and bought a car. About half of those had purchased the advertised model. See Figure 2.5 for a comparison of all these data sources.
Down the Mine Shaft
The experimental design called for the population to be divided into exactly two classes—success and failure. This is certainly a questionable design since it obscures interesting differences. Surely, people who come into the dealership to test-drive one model, but end up buying another should be in a different class than nonresponders, or people who respond, but buy nothing. For that matter, people who weren’t considered good enough prospects to be sent a mailing, but who nevertheless bought the car are an even more interesting group.