Data Type
Categorical variables are especially problematic for data mining techniques that use the numeric values of input variables. Numeric variables of the kind that can be summed and multiplied play to the strengths of data mining techniques, such as regression, K-means clustering, and neural networks, that are
470643 c18.qxd 3/8/04 11:31 AM Page 607
Putting Data Mining to Work 607
based on arithmetic operations. When data has many categorical variables, then decision trees are quite useful, although association rules and link analysis may be appropriate in some cases.
Number of Input Fields
In directed data mining applications, there should be a single target field or dependent variable. The rest of the fields (except for those that are either clearly irrelevant or clearly dependent on the target variable) are treated as potential inputs to the model. Data mining methods vary in their ability to successfully process large numbers of input fields. This can be a factor in deciding on the right technique for a particular application.
In general, techniques that rely on adjusting a vector of weights that has an element for each input field run into trouble when the number of fields grows very large. Neural networks and memory-based reasoning share that trait.
Association rules run into a different problem. The technique looks at all possible combinations of the inputs; as the number of inputs grows, processing the combinations becomes impossible to do in a reasonable amount of time.
Decision-tree methods are much less hindered by large numbers of fields.
As the tree is built, the decision-tree algorithm identifies the single field that contributes the most information at each node and bases the next segment of the rule on that field alone. Dozens or hundreds of other fields can come along for the ride, but won’t be represented in the final rules unless they contribute to the solution.
T I P When faced with a large number of fields for a directed data mining problem, it is a good idea to start by building a decision tree, even if the final model is to be built using a different technique. The decision tree will identify a good subset of the fields to use as input to a another technique that might be swamped by the original set of input variables.
Free-Form Text
Most data mining techniques are incapable of directly handling free-form text.
But clearly, text fields often contain extremely valuable information. When analyzing warranty claims submitted to an engine manufacturer by independent dealers, the mechanic’s free-form notes explaining what went wrong and what was done to fix the problem are at least as valuable as the fixed fields that show the part numbers and hours of labor used.
One data mining technique that can deal with free text is memory-based reasoning, one of the nearest neighbor methods discussed in Chapter 8. Recall that memory-based reasoning is based on the ability to measure the distance
470643 c18.qxd 3/8/04 11:31 AM Page 608
608 Chapter 18
from one record to all the other records in a database in order to form a neighborhood of similar records. Often, finding an appropriate distance metric is a stumbling block that makes it hard to apply the technique, but researchers in the field of information retrieval have come up with good measures of the distance between two blocks of text. These measurements are based on the overlap in vocabulary between the documents, especially of uncommon words and proper nouns. The ability of Web search engines to find appropriate articles is one familiar example of text mining.
As described in Chapter 8, memory-based reasoning on free-form text has also been used to classify workers into industries and job categories based on written job descriptions they supplied on the U.S. census long form and to add keywords to news stories.
Consider Hybrid Approaches
Sometimes, a combination of techniques works better than any single approach.
This may require breaking down a single data mining task into two or more sub-tasks. The automotive marketing example from Chapter 2 is a good example.
Researchers found that the best way of selecting prospects for a particular car model was to first use a neural network to identify people likely to buy a car, then use a decision tree to predict the particular model each car buyer would select.
Another example is a bank that uses three variables as input to a credit solicitation decision. The three inputs are estimates for:
■■
The likelihood of a response
■■
The projected first-year revenue from this customer
■■
The risk of the new customer defaulting
These tasks vary considerably in the amount of relevant training data likely to be available, the input fields likely to be important, and the length of time required to verify the accuracy of a prediction. Soon after a mailing, the bank knows exactly who responded because the solicitation contains a deadline after which responses are considered invalid. A whole year must pass before the estimated first-year revenue can be checked against the actual amount, and it may take even longer for a customer to “go bad.” Given all these differences, it is not be surprising that a different data mining techniques may turn out to be best for each task.
How One Company Began Data Mining
Over the years, the authors have watched many companies make their first forays into data mining. Although each company’s situation is unique, some
470643 c18.qxd 3/8/04 11:31 AM Page 609
Putting Data Mining to Work 609
common themes emerge. At each company there was someone responsible for the data mining project who truly believed in the power and potential of analytic customer relationship management, often because he or she had seen it in action in other companies. This leader was not usually a technical expert, and frequently did not do any of the actual technical work. He or she functioned as an evangelist to build the data mining team and secure sponsorship for a data mining pilot.
The successful efforts crossed corporate boundaries to involve people from both marketing and information technology. The teams were usually quite small—often consisting of only 4 or 5 people—yet included people who understood the data, people who understood the data mining techniques, people who understood the business problem to be addressed, and at least one person with experience applying data mining to business problems. Sometimes several of these roles were combined in one person.
In all cases, the initial data mining pilot project addressed a problem of real importance to the organization—one where the value of success would be recognized. Some of the best pilot projects were designed to measure the usefulness of data mining by looking at the results of the actions suggested by the data mining effort.
One of the companies, a wireless service provider, agreed to let us describe its data mining pilot project.
A Controlled Experiment in Retention
In 1996, Comcast Cellular was a wireless phone service provider in a market of 7.5 million people in a three-state area centered around Philadelphia. In 1999, Comcast Cellular was absorbed by SBC and is now part of Cingular, but at the time this pilot study took place, it was a regional service provider facing tough competition from fast-growing national networks. Increasing competition meant that subscribers were faced with many competing offers, and each month a significant proportion of the customer base switched to a competing service. This churn, as it is called in the industry, was very disturbing because even though new subscribers easily outnumbered the defectors, the acquisition cost for a new customer was often in the $500 to $600 range. There is a detailed discussion of churn in Chapter 4.
With even more competitors, poised to enter its market, Comcast Cellular wanted to reach out to existing subscribers with a proactive effort to ensure their continued happiness. The difficulty was knowing which customers were at risk and for what reasons. For any retention campaign, it is important to understand which customers are at risk because a retention offer costs the company money. It doesn’t make sense to offer an inducement to customers who are likely to remain anyway. It is equally important to understand what motivates different customer segments to leave, since different retention offers
470643 c18.qxd 3/8/04 11:31 AM Page 610
610 Chapter 18
are appropriate for different segments. An offer of free night and weekend minutes may be very attractive to customers who use their phones primarily to keep in touch with friends, but of little interest to business users.
The pilot project was a three-way partnership between Comcast, a group of data mining consultants (including the authors), and a telemarketing service bureau.
■■
Comcast supplied data and expertise on its own business practices and procedures.
■■
The data mining consultants developed profiles of likely defectors based on usage patterns in call detail data.
■■
The telemarketing service bureau worked with Comcast to use the profiles to develop retention offers for an outbound telemarketing campaign.
This description focuses on the data mining aspect of the combined effort.