Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

Like the first edition, this book is aimed at current and future data mining practitioners. It is not meant for software developers looking for detailed instructions on how to implement the various data mining algorithms nor for researchers trying to improve upon those algorithms. Ideas are presented in nontechnical language with minimal use of mathematical formulas and arcane jargon. Each data mining technique is shown in a real business context with examples of its use taken from real data mining engagements. In short, we have tried to write the book that we would have liked to read when we began our own data mining careers.

— Michael J. A. Berry, October, 2003

470643 flast.qxd 3/8/04 11:32 AM Page xxvi

470643 ftoc.qxd 3/8/04 11:33 AM Page v

Contents

Acknowledgments xix

About the Authors

xxi

Introduction xxiii

Chapter 1

Why and What Is Data Mining?

1

Analytic Customer Relationship Management

2

The Role of Transaction Processing Systems

3

The Role of Data Warehousing

4

The Role of Data Mining

5

The Role of the Customer Relationship Management Strategy

6

What Is Data Mining?

7

What Tasks Can Be Performed with Data Mining?

8

Classification 8

Estimation 9

Prediction 10

Affinity Grouping or Association Rules

11

Clustering 11

Profiling 12

Why Now?

12

Data Is Being Produced

12

Data Is Being Warehoused

13

Computing Power Is Affordable

13

Interest in Customer Relationship Management Is Strong

13

Every Business Is a Service Business

14

Information Is a Product

14

Commercial Data Mining Software Products

Have Become Available

15

v

470643 ftoc.qxd 3/8/04 11:33 AM Page vi

vi

Contents

How Data Mining Is Being Used Today

15

A Supermarket Becomes an Information Broker

15

A Recommendation-Based Business

16

Cross-Selling 17

Holding on to Good Customers

17

Weeding out Bad Customers

18

Revolutionizing an Industry

18

And Just about Anything Else

19

Lessons Learned

19

Chapter 2

The Virtuous Cycle of Data Mining

21

A Case Study in Business Data Mining

22

Identifying the Business Challenge

23

Applying Data Mining

24

Acting on the Results

25

Measuring the Effects

25

What Is the Virtuous Cycle?

26

Identify the Business Opportunity

27

Mining Data

28

Take Action

30

Measuring Results

30

Data Mining in the Context of the Virtuous Cycle

32

A Wireless Communications Company Makes

the Right Connections

34

The Opportunity

34

How Data Mining Was Applied

35

Defining the Inputs

37

Derived Inputs

37

The Actions

38

Completing the Cycle

39

Neural Networks and Decision Trees Drive SUV Sales

39

The Initial Challenge

39

How Data Mining Was Applied

40

The Data

40

Down the Mine Shaft

40

The Resulting Actions

41

Completing the Cycle

42

Lessons Learned

42

Chapter 3

Data Mining Methodology and Best Practices

43

Why Have a Methodology?

44

Learning Things That Aren’t True

44

Patterns May Not Represent Any Underlying Rule

45

The Model Set May Not Reflect the Relevant Population

46

Data May Be at the Wrong Level of Detail

47

470643 ftoc.qxd 3/8/04 11:33 AM Page vii

Contents

vii

Learning Things That Are True, but Not Useful

48

Learning Things That Are Already Known

49

Learning Things That Can’t Be Used

49

Hypothesis Testing

50

Generating Hypotheses

51

Testing Hypotheses

51

Models, Profiling, and Prediction

51

Profiling 53

Prediction 54

The Methodology

54

Step One: Translate the Business Problem

into a Data Mining Problem

56

What Does a Data Mining Problem Look Like?

56

How Will the Results Be Used?

57

How Will the Results Be Delivered?

58

The Role of Business Users and Information Technology

58

Step Two: Select Appropriate Data

60

What Is Available?

61

How Much Data Is Enough?

62

How Much History Is Required?

63

How Many Variables?

63

What Must the Data Contain?

64

Step Three: Get to Know the Data

64

Examine Distributions

65

Compare Values with Descriptions

66

Validate Assumptions

67

Ask Lots of Questions

67

Step Four: Create a Model Set

68

Assembling Customer Signatures

68

Creating a Balanced Sample

68

Including Multiple Timeframes

70

Creating a Model Set for Prediction

70

Partitioning the Model Set

71

Step Five: Fix Problems with the Data

72

Categorical Variables with Too Many Values

73

Numeric Variables with Skewed Distributions and Outliers

73

Missing Values

73

Values with Meanings That Change over Time

74

Inconsistent Data Encoding

74

Step Six: Transform Data to Bring Information to the Surface

74

Capture Trends

75

Create Ratios and Other Combinations of Variables

75

Convert Counts to Proportions

75

Step Seven: Build Models

77

470643 ftoc.qxd 3/8/04 11:33 AM Page viii

viii Contents

Step Eight: Assess Models

78

Assessing Descriptive Models

78

Assessing Directed Models

78

Assessing Classifiers and Predictors

79

Assessing Estimators

79

Comparing Models Using Lift

81

Problems with Lift

83

Step Nine: Deploy Models

84

Step Ten: Assess Results

85

Step Eleven: Begin Again

85

Lessons Learned

86

Chapter 4

Data Mining Applications in Marketing and

Customer Relationship Management

87

Prospecting 87

Identifying Good Prospects

88

Choosing a Communication Channel

89

Picking Appropriate Messages

89

Data Mining to Choose the Right Place to Advertise

90

Who Fits the Profile?

90

Measuring Fitness for Groups of Readers

93

Data Mining to Improve Direct Marketing Campaigns

95

Response Modeling

96

Optimizing Response for a Fixed Budget

97

Optimizing Campaign Profitability

100

How the Model Affects Profitability

103

Reaching the People Most Influenced by the Message

106

Differential Response Analysis

107

Using Current Customers to Learn About Prospects

108

Start Tracking Customers before They Become Customers

109

Gather Information from New Customers

109

Acquisition-Time Variables Can Predict Future Outcomes

110

Data Mining for Customer Relationship Management

110

Matching Campaigns to Customers

110

Segmenting the Customer Base

111

Finding Behavioral Segments

111

Tying Market Research Segments to Behavioral Data

113

Reducing Exposure to Credit Risk

113

Predicting Who Will Default

113

Improving Collections

114

Determining Customer Value

114

Cross-selling, Up-selling, and Making Recommendations

115

Finding the Right Time for an Offer

115

Making Recommendations

116

Retention and Churn

116

Recognizing Churn

116

Why Churn Matters

117

Different Kinds of Churn

118

470643 ftoc.qxd 3/8/04 11:33 AM Page ix

Contents

ix

Different Kinds of Churn Model

119

Predicting Who Will Leave

119

Predicting How Long Customers Will Stay

119

Lessons Learned

120

Chapter 5

The Lure of Statistics: Data Mining Using Familiar Tools

123

Occam’s Razor

124

The Null Hypothesis

125

P-Values 126

A Look at Data

126

Looking at Discrete Values

127

Histograms 127

Time Series

128

Standardized Values

129

From Standardized Values to Probabilities

133

Cross-Tabulations 136

Looking at Continuous Variables

136

Statistical Measures for Continuous Variables

137

Variance and Standard Deviation

138

A Couple More Statistical Ideas

139

Measuring Response

139

Standard Error of a Proportion

139

Comparing Results Using Confidence Bounds

141

Comparing Results Using Difference of Proportions

143

Size of Sample

145

What the Confidence Interval Really Means

146

Size of Test and Control for an Experiment

147

Multiple Comparisons

148

The Confidence Level with Multiple Comparisons

148

Bonferroni’s Correction

149

Chi-Square Test

149

Expected Values

150

Chi-Square Value

151

Comparison of Chi-Square to Difference of Proportions

153

An Example: Chi-Square for Regions and Starts

155

Data Mining and Statistics

158

No Measurement Error in Basic Data

159

There Is a Lot of Data

160

Time Dependency Pops Up Everywhere

160

Experimentation is Hard

160

Data Is Censored and Truncated

161

Lessons Learned

162

Chapter 6

Decision Trees

165

What Is a Decision Tree?

166

Classification 166

Scoring 169

Estimation 170

Trees Grow in Many Forms

170

470643 ftoc.qxd 3/8/04 11:33 AM Page x

x

Contents

How a Decision Tree Is Grown

171

Finding the Splits

172

Splitting on a Numeric Input Variable

173

Splitting on a Categorical Input Variable

174

Splitting in the Presence of Missing Values

174

Growing the Full Tree

175

Measuring the Effectiveness Decision Tree

176

Tests for Choosing the Best Split

176

Purity and Diversity

177

Gini or Population Diversity

178

Entropy Reduction or Information Gain

179

Information Gain Ratio

180

Chi-Square Test

180

Reduction in Variance

183

F Test

183

Pruning 184

The CART Pruning Algorithm

185

Creating the Candidate Subtrees

185

Picking the Best Subtree

189

Using the Test Set to Evaluate the Final Tree

189

The C5 Pruning Algorithm

190

Pessimistic Pruning

191

Stability-Based Pruning

191

TEAMFLY

Extracting Rules from Trees

193

Taking Cost into Account

195

Further Refinements to the Decision Tree Method

195

Using More Than One Field at a Time

195

Tilting the Hyperplane

197

Neural Trees

199

Piecewise Regression Using Trees

199

Alternate Representations for Decision Trees

199

Box Diagrams

199

Tree Ring Diagrams

201

Decision Trees in Practice

203

Decision Trees as a Data Exploration Tool

203

Applying Decision-Tree Methods to Sequential Events

205

Simulating the Future

206

Case Study: Process Control in a Coffee-Roasting Plant

206

Lessons Learned

209

Chapter 7

Artificial Neural Networks

211

A Bit of History

212

Real Estate Appraisal

213

Neural Networks for Directed Data Mining

219

What Is a Neural Net?

220

What Is the Unit of a Neural Network?

222

Feed-Forward Neural Networks

226

Team-Fly®

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *