Like the first edition, this book is aimed at current and future data mining practitioners. It is not meant for software developers looking for detailed instructions on how to implement the various data mining algorithms nor for researchers trying to improve upon those algorithms. Ideas are presented in nontechnical language with minimal use of mathematical formulas and arcane jargon. Each data mining technique is shown in a real business context with examples of its use taken from real data mining engagements. In short, we have tried to write the book that we would have liked to read when we began our own data mining careers.
— Michael J. A. Berry, October, 2003
470643 flast.qxd 3/8/04 11:32 AM Page xxvi
470643 ftoc.qxd 3/8/04 11:33 AM Page v
Contents
Acknowledgments xix
About the Authors
xxi
Introduction xxiii
Chapter 1
Why and What Is Data Mining?
1
Analytic Customer Relationship Management
2
The Role of Transaction Processing Systems
3
The Role of Data Warehousing
4
The Role of Data Mining
5
The Role of the Customer Relationship Management Strategy
6
What Is Data Mining?
7
What Tasks Can Be Performed with Data Mining?
8
Classification 8
Estimation 9
Prediction 10
Affinity Grouping or Association Rules
11
Clustering 11
Profiling 12
Why Now?
12
Data Is Being Produced
12
Data Is Being Warehoused
13
Computing Power Is Affordable
13
Interest in Customer Relationship Management Is Strong
13
Every Business Is a Service Business
14
Information Is a Product
14
Commercial Data Mining Software Products
Have Become Available
15
v
470643 ftoc.qxd 3/8/04 11:33 AM Page vi
vi
Contents
How Data Mining Is Being Used Today
15
A Supermarket Becomes an Information Broker
15
A Recommendation-Based Business
16
Cross-Selling 17
Holding on to Good Customers
17
Weeding out Bad Customers
18
Revolutionizing an Industry
18
And Just about Anything Else
19
Lessons Learned
19
Chapter 2
The Virtuous Cycle of Data Mining
21
A Case Study in Business Data Mining
22
Identifying the Business Challenge
23
Applying Data Mining
24
Acting on the Results
25
Measuring the Effects
25
What Is the Virtuous Cycle?
26
Identify the Business Opportunity
27
Mining Data
28
Take Action
30
Measuring Results
30
Data Mining in the Context of the Virtuous Cycle
32
A Wireless Communications Company Makes
the Right Connections
34
The Opportunity
34
How Data Mining Was Applied
35
Defining the Inputs
37
Derived Inputs
37
The Actions
38
Completing the Cycle
39
Neural Networks and Decision Trees Drive SUV Sales
39
The Initial Challenge
39
How Data Mining Was Applied
40
The Data
40
Down the Mine Shaft
40
The Resulting Actions
41
Completing the Cycle
42
Lessons Learned
42
Chapter 3
Data Mining Methodology and Best Practices
43
Why Have a Methodology?
44
Learning Things That Aren’t True
44
Patterns May Not Represent Any Underlying Rule
45
The Model Set May Not Reflect the Relevant Population
46
Data May Be at the Wrong Level of Detail
47
470643 ftoc.qxd 3/8/04 11:33 AM Page vii
Contents
vii
Learning Things That Are True, but Not Useful
48
Learning Things That Are Already Known
49
Learning Things That Can’t Be Used
49
Hypothesis Testing
50
Generating Hypotheses
51
Testing Hypotheses
51
Models, Profiling, and Prediction
51
Profiling 53
Prediction 54
The Methodology
54
Step One: Translate the Business Problem
into a Data Mining Problem
56
What Does a Data Mining Problem Look Like?
56
How Will the Results Be Used?
57
How Will the Results Be Delivered?
58
The Role of Business Users and Information Technology
58
Step Two: Select Appropriate Data
60
What Is Available?
61
How Much Data Is Enough?
62
How Much History Is Required?
63
How Many Variables?
63
What Must the Data Contain?
64
Step Three: Get to Know the Data
64
Examine Distributions
65
Compare Values with Descriptions
66
Validate Assumptions
67
Ask Lots of Questions
67
Step Four: Create a Model Set
68
Assembling Customer Signatures
68
Creating a Balanced Sample
68
Including Multiple Timeframes
70
Creating a Model Set for Prediction
70
Partitioning the Model Set
71
Step Five: Fix Problems with the Data
72
Categorical Variables with Too Many Values
73
Numeric Variables with Skewed Distributions and Outliers
73
Missing Values
73
Values with Meanings That Change over Time
74
Inconsistent Data Encoding
74
Step Six: Transform Data to Bring Information to the Surface
74
Capture Trends
75
Create Ratios and Other Combinations of Variables
75
Convert Counts to Proportions
75
Step Seven: Build Models
77
470643 ftoc.qxd 3/8/04 11:33 AM Page viii
viii Contents
Step Eight: Assess Models
78
Assessing Descriptive Models
78
Assessing Directed Models
78
Assessing Classifiers and Predictors
79
Assessing Estimators
79
Comparing Models Using Lift
81
Problems with Lift
83
Step Nine: Deploy Models
84
Step Ten: Assess Results
85
Step Eleven: Begin Again
85
Lessons Learned
86
Chapter 4
Data Mining Applications in Marketing and
Customer Relationship Management
87
Prospecting 87
Identifying Good Prospects
88
Choosing a Communication Channel
89
Picking Appropriate Messages
89
Data Mining to Choose the Right Place to Advertise
90
Who Fits the Profile?
90
Measuring Fitness for Groups of Readers
93
Data Mining to Improve Direct Marketing Campaigns
95
Response Modeling
96
Optimizing Response for a Fixed Budget
97
Optimizing Campaign Profitability
100
How the Model Affects Profitability
103
Reaching the People Most Influenced by the Message
106
Differential Response Analysis
107
Using Current Customers to Learn About Prospects
108
Start Tracking Customers before They Become Customers
109
Gather Information from New Customers
109
Acquisition-Time Variables Can Predict Future Outcomes
110
Data Mining for Customer Relationship Management
110
Matching Campaigns to Customers
110
Segmenting the Customer Base
111
Finding Behavioral Segments
111
Tying Market Research Segments to Behavioral Data
113
Reducing Exposure to Credit Risk
113
Predicting Who Will Default
113
Improving Collections
114
Determining Customer Value
114
Cross-selling, Up-selling, and Making Recommendations
115
Finding the Right Time for an Offer
115
Making Recommendations
116
Retention and Churn
116
Recognizing Churn
116
Why Churn Matters
117
Different Kinds of Churn
118
470643 ftoc.qxd 3/8/04 11:33 AM Page ix
Contents
ix
Different Kinds of Churn Model
119
Predicting Who Will Leave
119
Predicting How Long Customers Will Stay
119
Lessons Learned
120
Chapter 5
The Lure of Statistics: Data Mining Using Familiar Tools
123
Occam’s Razor
124
The Null Hypothesis
125
P-Values 126
A Look at Data
126
Looking at Discrete Values
127
Histograms 127
Time Series
128
Standardized Values
129
From Standardized Values to Probabilities
133
Cross-Tabulations 136
Looking at Continuous Variables
136
Statistical Measures for Continuous Variables
137
Variance and Standard Deviation
138
A Couple More Statistical Ideas
139
Measuring Response
139
Standard Error of a Proportion
139
Comparing Results Using Confidence Bounds
141
Comparing Results Using Difference of Proportions
143
Size of Sample
145
What the Confidence Interval Really Means
146
Size of Test and Control for an Experiment
147
Multiple Comparisons
148
The Confidence Level with Multiple Comparisons
148
Bonferroni’s Correction
149
Chi-Square Test
149
Expected Values
150
Chi-Square Value
151
Comparison of Chi-Square to Difference of Proportions
153
An Example: Chi-Square for Regions and Starts
155
Data Mining and Statistics
158
No Measurement Error in Basic Data
159
There Is a Lot of Data
160
Time Dependency Pops Up Everywhere
160
Experimentation is Hard
160
Data Is Censored and Truncated
161
Lessons Learned
162
Chapter 6
Decision Trees
165
What Is a Decision Tree?
166
Classification 166
Scoring 169
Estimation 170
Trees Grow in Many Forms
170
470643 ftoc.qxd 3/8/04 11:33 AM Page x
x
Contents
How a Decision Tree Is Grown
171
Finding the Splits
172
Splitting on a Numeric Input Variable
173
Splitting on a Categorical Input Variable
174
Splitting in the Presence of Missing Values
174
Growing the Full Tree
175
Measuring the Effectiveness Decision Tree
176
Tests for Choosing the Best Split
176
Purity and Diversity
177
Gini or Population Diversity
178
Entropy Reduction or Information Gain
179
Information Gain Ratio
180
Chi-Square Test
180
Reduction in Variance
183
F Test
183
Pruning 184
The CART Pruning Algorithm
185
Creating the Candidate Subtrees
185
Picking the Best Subtree
189
Using the Test Set to Evaluate the Final Tree
189
The C5 Pruning Algorithm
190
Pessimistic Pruning
191
Stability-Based Pruning
191
TEAMFLY
Extracting Rules from Trees
193
Taking Cost into Account
195
Further Refinements to the Decision Tree Method
195
Using More Than One Field at a Time
195
Tilting the Hyperplane
197
Neural Trees
199
Piecewise Regression Using Trees
199
Alternate Representations for Decision Trees
199
Box Diagrams
199
Tree Ring Diagrams
201
Decision Trees in Practice
203
Decision Trees as a Data Exploration Tool
203
Applying Decision-Tree Methods to Sequential Events
205
Simulating the Future
206
Case Study: Process Control in a Coffee-Roasting Plant
206
Lessons Learned
209
Chapter 7
Artificial Neural Networks
211
A Bit of History
212
Real Estate Appraisal
213
Neural Networks for Directed Data Mining
219
What Is a Neural Net?
220
What Is the Unit of a Neural Network?
222
Feed-Forward Neural Networks
226
Team-Fly®