Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

470643 c18.qxd 3/8/04 11:31 AM Page 614

614 Chapter 18

plan allows. Since the extra minutes are charged at a high rate, these customers end up paying higher bills than they would on a more expensive rate plan with more included minutes. Moving these customers to a higher-rate plan would save them some money, while also increasing the amount of revenue from the fixed portion of their monthly bill.

The Proof of the Pudding

Comcast was able to make a direct cost/benefit analysis of the combined data mining and telemarketing action plan. Armed with this data, Comcast was able to make an informed decision to invest in future data mining efforts. Of course, the story does not really end there; it never does.

The company was faced with a whole new set of questions based on the data that comes back from the initial study. New hypotheses were formed and tested. The response data from the telemarketing effort became fodder for a new round of knowledge discovery. New product ideas and service plans were tried out. Each round of data mining started from a higher base because the company knew its customers better. That is the virtuous cycle of data mining.

Lessons Learned

In a business context, the successful introduction of data mining requires using data mining techniques to address a real business challenge. For companies that are just getting started with analytical customer relationship management, integrating data mining can be a daunting task. A proof-of-concept project is a good way to get started. The proof of concept should create a solid business case for further integration of data mining into the company’s marketing, sales, and customer-support operations. This means that the project should be in an area where it is easy to link improved understanding gained through data mining with improved profitability.

The most successful proof-of-concept projects start with a well-defined business problem, and use data related to that problem to create a plan of action.

The action is then carried out in a controlled manner and the results carefully analyzed to evaluate the effectiveness of the action taken. In other words, the proof of concept should involve one full trip around the virtuous cycle of data mining. If this initial project is successful, it will be the first of many. The primary lesson from this chapter is also an important lesson of the book as a whole: data mining techniques only become useful when applied to meaningful problems. Data mining is a technical activity that requires technical expertise, but its success is measured by its effect on the business.

470643 bindex.qxd 3/8/04 11:08 AM Page 615

Index

A

adjusted error rates, CART

absolute values, distance function, 275

algorithm, 185

accuracy

advertising. See also marketing

classification and predication, 79

campaigns

estimation, 79–81

communication channels,

acquisition

prospecting, 89

acquisition-time data, 108–110

prospects, 90–94

customer relationships, 461–464

word-of-mouth, 283

actions

affinity grouping

actionable data, 516

association rules, 11

actionable results, 22

business goals, formulating, 605

actionable rules, association rules, 296

cross-selling opportunities, 11

control group response versus

data transformation, 57

market research response, 38

undirected data mining, 57

taking control of, 30

affordability, server platforms, 13

activation function, neural

agglomerative clustering, automatic

networks, 222

cluster detection, 368–370

acuity of testing, statistical analysis,

aggregation, confusion and, 48

147–148

aggression, behavior-based

ad hoc questions

variables, 18

behavior-based variables, 585

AI (artificial intelligence), 15

business opportunities,

algorithms, recursive, 173

identifying, 27

alphas, decision trees, 188

hypothesis testing, 50–51

American Express

additive facts, OLAP, 501

as information broker, 16

addresses, geographical resources,

orders, market based analysis, 292

555–556

615

470643 bindex.qxd 3/8/04 11:08 AM Page 616

616 Index

analysis

sensitivity, 247–248

differential response, 107–108

sequential, 318–319

link analysis

statistical

acyclic graphs, 331

acuity of testing, marketing

authorities, 333–334

campaign approaches, 147–148

candidates, 333

business data versus scientific

case study, 343–346

data, 159

classification, 9

censored data, 161

communities of interest, graphs, 346

Central Limit Theorem, 129–130

cyclic graphs, 330–331

chi-square tests, 149–153

data, as graphs, 340

confidence intervals, marketing

directed graphs, 330

campaign approaches, 146

discussed, 321

continuous variables, 137–138

edges, graphs, 322

correlation ranges, 139

fax machines, 337–341

cross-tabulations, 136

graph-coloring algorithm, 340–341

density function, 133

Hamiltonian path, graphs, 328

as disciplinary technique, 123

hubs, 332–334

discrete values, 127–131

Kleinberg algorithm, 332–333

experimentation, 160–161

nodes, graphs, 322

field values, 128

planar graphs, 323

histograms and, 127

root sets, 333

mean values, 137

search programs, 331

median values, 137

stemming, 333

mode values, 137

traveling salesman problem,

multiple comparisons, 148–149

graphs, 327–329

normal distribution, 130–132

vertices, graphs, 322

null hypothesis and, 125–126

weighted graphs, 322, 324

probabilities, 133–135

market based

proportion, standard error of,

differentiation, 289

marketing campaign

discussed, 287

approaches, 139–141

geographic attributes, 293

p-values, 126

item popularity, 293

q-values, 126

item sets, 289

range values, 137

market basket data, 51, 289–291

regression ranges, 139

marketing interventions, tracking,

sample sizes, marketing campaign

293–294

approaches, 145

order characteristics, 292

sample variation, 129

products, clustering by usage,

standard deviation, 132, 138

294–295

standardized values, 129–133

purchases, 289

sum of values, 137–138

support, 301

time series analysis, 128–129

telecommunications customers, 288

truncated data, 162

time attributes, 293

470643 bindex.qxd 3/8/04 11:08 AM Page 617

Index 617

variance, 138

sequential analysis, 318–319

z-values, 131, 138

for store comparisons, 315–316

survival

trivial rules, 297

attrition, handling different types

virtual items, 307

of, 412–413

assumptions, validation, 67

customer relationships, 413–415

attrition

estimation tasks, 10

discussed, 17

forecasting, 415–416

forced, 118

time series

future, 49

neural networks, 244–247

proof-of-concept projects, 599

non-time series data, 246

survival analysis, 412–413

SQL data, 572–573

audio, binary data, 557

statistics, 128–129

authorities, link analysis, 333–334

of variance, 124

automated systems

analysts, responsibilities of, 492–493

neural networks, 213

analytic efforts, wasted time, 27

transaction processing systems, 3–4

AND value, neural networks, 222

automatic cluster detection

angles, between vectors, 361–362

agglomerative clustering, 368–370

anonymous versus identified

case study, 374–378

transactions, association rules, 308

categorical variables, 359

application programming interface

centroid distance, 369

(API), 535

complete linkage, 369

architecture, data mining, 528–532

data preparation, 363–365

artificial intelligence (AI), 15

dimension, 352

assessing models

directed clustering, 372

classifiers and predictors, 79

discussed, 12, 91, 351

descriptive models, 78

distance and similarity, 359–363

directed models, 78–79

divisive clustering, 371–372

estimators, 79–81

evaluation, 372–373

association rules

Gaussian mixture model, 366–367

actionable rules, 296

geometric distance, 360–361

affinity grouping, 11

hard clustering, 367

anonymous versus identified

Hertzsprung-Russell diagram,

transactions, 308

352–354

data quality, 308

K-means algorithm, 354–358

dissociation rules, 317

luminosity, 351

effectiveness of, 299–301

natural association, 358

inexplicable rules, 297–298

scaling, 363–364

point-of-sale data, 288

single linkage, 369

practical limits, overcoming, 311–313

soft clustering, 367

prediction, 70

SOM (self-organizing map), 372

probabilities, calculating, 309

vectors, angles between, 361–362

products, hierarchical categories, 305

weighting, 363–365

zone boundaries, adjusting, 380

470643 bindex.qxd 3/8/04 11:08 AM Page 618

618 Index

auxiliary information, 569–571

neural networks, 227

availability of data, determining,

response, methods of, 146

515–516

untruthful learning sources, 46–47

average member technique, neural

BILL_MASTER file, customer

networks, 252

signatures, 559

averages, estimation, 81

binary churn models, 119

binary classification

B

decision trees, 168

back propagation, feed-forward

misclassification rates, 98

neural networks, 228–232

binary data, 557

backfitting, defined, 170

binning, 237, 551

bad customers, customer relationship

binomial formula (Jacques

management, 18

Bernoulli), 191

bad data formats, data

biological neural networks, 211

transformation, 28

births, house-hold level data, 96

balance transfer programs, industry

bizocity scores, 112–113

revolution, 18

Bonferroni, Carlo (Bonferroni’s

balanced datasets, model sets, 68

correction), 149

balanced sampling, 68

box diagrams, as alternative to

bathtub hazards, 397–398

decision trees, 199–201

behaviors

brainstorming meetings, 37

behavioral segments, marketing

branching nodes, decision trees, 176

campaigns, 111–113

budgets, fixed, marketing campaigns,

behavior-based variables

97–100

ad hoc questions, 585

building models, data mining, 8, 77

aggression, 18

Building the Data Warehouse (Bill

convenience users, 580, 587–589

Inmon), 474

declining usage, 577–579

Business Modeling and Data Mining

estimated revenue, segmenting,

(Dorian Pyle), 60

581–583

businesses

ideals, comparisons to, 585–587

challenges of, identifying, 23–24

potential revenue, 583–585

customer relationship

purchasing frequency, 575–576

management, 2–6

revolvers, 580

customer-centric, 514–515

transactions, 580

forward-looking, 2

future customer behaviors,

home-based, 56

predicting, 10

large-business relationships, 3–4

bell-shaped distribution, 132

opportunities, identifying

benefit, point of maximum, 101

virtuous cycle, 27–28

Bernoulli, Jacques (binomial

wireless communication industries,

formula), 191

34–35

biased sampling

product-focused, 2

confidence intervals, statistical

recommendation-based, 16–17

analysis, 146

small-business relationships, 2

470643 bindex.qxd 3/8/04 11:08 AM Page 619

Index 619

C

car ownership, house-hold level data,

calculations, probabilities, 133–135

96

call detail databases, 37

CART (Classification and Regression

call-center records, useful data

Trees) algorithm, decision trees,

sources, 60

185, 188–189

campaigns, marketing. See also

case studies

advertising

automatic cluster detection, 374–378

acquisitions-time data, 108–110

chi-square tests, 155–158

canonical measurements, 31

decision trees, 206, 208

champion-challenger approach, 139

generic algorithms, 440–443

credit risks, reducing exposure to,

link analysis, 343–346

113–114

MBR (memory-based reasoning),

cross-selling, 115–116

259–262

customer response, tracking, 109

neural networks, 252–254

customer segmentation, 111–113

catalogs

differential response analysis,

response models, decision trees

107–108

for, 175

discussed, 95

retailers, historical customer

fixed budgets, 97–100

behavior data, 5

loyalty programs, 111

categorical variables

new customer information,

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *