470643 c02.qxd 3/8/04 11:09 AM Page 41
The Virtuous Cycle of Data Mining
41
Sales (270,172)
Resp Cards
(32,904)
Mass Mailing
(1,000,003)
Resp Calls
(16,453)
Figure 2.5 Prospects in the training set have overlapping relationships.
Be that as it may, success was defined as “received a mailing and bought the car” and failure was defined as “received the mailing, but did not buy the car.”
A series of trials was run using decision trees and neural networks. The tools were tested on various kinds of training sets. Some of the training sets reflected the true proportion of successes in the database, while others were enriched to have up to 10 percent successes—and higher concentrations might have produced better results.
The neural network did better on the sparse training sets, while the decision tree tool appeared to do better on the enriched sets. The researchers decided on a two-stage process. First, a neural network determined who was likely to buy a car, any car, from the company. Then, the decision tree was used to predict which of the likely car buyers would choose the advertised model. This two-step process proved quite successful. The hybrid data mining model combining decision trees and neural networks missed very few buyers of the targeted model while at the same time screening out many more nonbuyers than either the neural net or the decision tree was able to do.
The Resulting Actions
Armed with a model that could effectively reach responders the company decided to take the money saved by mailing fewer pieces and put it into improving the lure offered to get likely buyers into the showroom. Instead of sunglasses for the masses, they offered a nice pair of leather boots to the far
470643 c02.qxd 3/8/04 11:09 AM Page 42
42
Chapter 2
smaller group of likely buyers. The new approach proved much more effective than the first.
Completing the Cycle
The university-based data mining project showed that even with only a limited number of broad-brush variables to work with and fairly primitive data mining tools, data mining could improve the effectiveness of a direct marketing campaign for a big-ticket item like an automobile. The next step is to gather more data, build better models, and try again!
Lessons Learned
This chapter started by recalling the drivers of the industrial revolution and the creation of large mills in England and New England. These mills are now abandoned, torn down, or converted to other uses. Water is no longer the driving force of business. It has been replaced by data.
The virtuous cycle of data mining is about harnessing the power of data and transforming it into actionable business results. Just as water once turned the wheels that drove machines throughout a mill, data needs to be gathered and TEAMFLY
disseminated throughout an organization to provide value. If data is water in this analogy, then data mining is the wheel, and the virtuous cycle spreads the power of the data to all the business processes.
The virtuous cycle of data mining is a learning process based on customer data. It starts by identifying the right business opportunities for data mining.
The best business opportunities are those that will be acted upon. Without action, there is little or no value to be gained from learning about customers.
Also very important is measuring the results of the action. This completes the loop of the virtuous cycle, and often suggests further data mining opportunities.
Team-Fly®
470643 c03.qxd 3/8/04 11:09 AM Page 43
C H A P T E R
3
Data Mining Methodology
and Best Practices
The preceding chapter introduced the virtuous cycle of data mining as a business process. That discussion divided the data mining process into four stages: 1. Identifying the problem
2. Transforming data into information
3. Taking action
4. Measuring the outcome
Now it is time to start looking at data mining as a technical process. The high-level outline remains the same, but the emphasis shifts. Instead of identifying a business problem, we now turn our attention to translating business problems into data mining problems. The topic of transforming data into information is expanded into several topics including hypothesis testing, profiling, and predictive modeling. In this chapter, taking action refers to technical actions such as model deployment and scoring. Measurement refers to the testing that must be done to assess a model’s stability and effectiveness before it is used to guide marketing actions.
Because the entire book is based on this methodology, the best practices introduced here are elaborated upon elsewhere. The purpose of this chapter is to bring them together in one place and to organize them into a methodology.
The best way to avoid breaking the virtuous cycle of data mining is to understand the ways it is likely to fail and take preventative steps. Over the 43
470643 c03.qxd 3/8/04 11:09 AM Page 44
44
Chapter 3
years, the authors have encountered many ways for data mining projects to go wrong. In response, we have developed a useful collection of habits—things we do to smooth the path from the initial statement of a business problem to a stable model that produces actionable and measurable results. This chapter presents this collection of best practices as the orderly steps of a data mining methodology. Don’t be fooled—data mining is a naturally iterative process.
Some steps need to be repeated several times, but none should be skipped entirely.
The need for a rigorous approach to data mining increases with the complexity of the data mining approach. After establishing the need for a methodology by describing various ways that data mining efforts can fail in the absence of one, the chapter starts with the simplest approach to data mining—
using ad hoc queries to test hypotheses—and works up to more sophisticated activities such as building formal profiles that can be used as scoring models and building true predictive models. Finally, the four steps of the virtuous cycle are translated into an 11-step data mining methodology.
Why Have a Methodology?
Data mining is a way of learning from the past so as to make better decisions in the future. The best practices described in this chapter are designed to avoid two undesirable outcomes of the learning process:
■■
Learning things that aren’t true
■■
Learning things that are true, but not useful
These pitfalls are like the rocks of Scylla and the whirlpool of Charybdis that protect the narrow straits between Sicily and the Italian mainland. Like the ancient sailors who learned to avoid these threats, data miners need to know how to avoid common dangers.
Learning Things That Aren’t True
Learning things that aren’t true is more dangerous than learning things that are useless because important business decisions may be made based on incorrect information. Data mining results often seem reliable because they are based on actual data in a seemingly scientific manner. This appearance of reliability can be deceiving. The data itself may be incorrect or not relevant to the question at hand. The patterns discovered may reflect past business decisions or nothing at all. Data transformations such as summarization may have destroyed or hidden important information. The following sections discuss some of the more common problems that can lead to false conclusions.
470643 c03.qxd 3/8/04 11:09 AM Page 45
Data Mining Methodology and Best Practices
45
Patterns May Not Represent Any Underlying Rule
It is often said that figures don’t lie, but liars can figure. When it comes to finding patterns in data, figures don’t have to actually lie in order to suggest things that aren’t true. There are so many ways to construct patterns that any random set of data points will reveal one if examined long enough. Human beings depend so heavily on patterns in our lives that we tend to see them even when they are not there. We look up at the nighttime sky and see not a random arrangement of stars, but the Big Dipper, or, the Southern Cross, or Orion’s Belt. Some even see astrological patterns and portents that can be used to predict the future. The widespread acceptance of outlandish conspiracy theories is further evidence of the human need to find patterns.
Presumably, the reason that humans have evolved such an affinity for patterns is that patterns often do reflect some underlying truth about the way the world works. The phases of the moon, the progression of the seasons, the constant alternation of night and day, even the regular appearance of a favorite TV
show at the same time on the same day of the week are useful because they are stable and therefore predictive. We can use these patterns to decide when it is safe to plant tomatoes and how to program the VCR. Other patterns clearly do not have any predictive power. If a fair coin comes up heads five times in a row, there is still a 50-50 chance that it will come up tails on the sixth toss.
The challenge for data miners is to figure out which patterns are predictive and which are not. Consider the following patterns, all of which have been cited in articles in the popular press as if they had predictive value: