Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 150 – Library. Read online. Free books read online. Read books without registering

automatic cluster detection, 359

gathering, 109–110

data correction, 73

people most influenced by, 106–107

marriages, 239–240

planning, 27

measures of, 549

profitability, 100–104

neural networks, 239–240

proof-of-concept projects, 600

propensity, 242

response modeling, 96–97

splits, decision trees, 174

as statistical analysis

censored data

acuity of testing, 147–148

hazards, 399–403

confidence intervals, 146

statistics, 161

proportion, standard error of,

census data

139–141

proportional scoring, 94–95

results, comparing, using

useful data sources, 61

confidence bounds, 141–143

Central Limit Theorem, statistics,

sample sizes, 145

129–130

targeted acquisition campaigns, 31

central repository, 484, 488, 490

types of, 111

centroid distance, automatic cluster

up-selling, 115–116

detection, 369

usage stimulation, 111

C5 pruning algorithm, decision trees,

candidates, link analysis, 333

190–191

canonical measurements, marketing

CHAID (Chi-square Automatic

campaigns, 31

Interaction Detector), 182–183

capture trends, data transformation, 75

challenges, business challenges,

identifying, 23–24

470643 bindex.qxd 3/8/04 11:08 AM Page 620

620 Index

champion-challenger approach,

correct classification matrix, 79

marketing campaigns, 139

data transformation, 57

change processes, feedback, 34

decision trees, 166–168

charts

directed data mining, 57

concentration, 101

discrete outcomes, 9

cumulative gains, 101

estimation, 9

lift charts, 82, 84

leaf nodes, 167

time series, 128–129

memory-based reasoning, 90–91

CHIDIST function, 152

overview, 8–9

child nodes, classification, 167

performance, 12

children, number of, house-hold level

Classification and Regression Trees

data, 96

(CART) algorithm, decision trees,

chi-square tests

185, 188–189

case study, 155–158

classification codes

CHAID (Chi-square Automatic

discussed, 266

Interaction Detector), 182–183

precision measurements, 273–274

CHIDIST function, 152

recall measurements, 273–274

degrees of freedom values, 152–153

clustering

difference of proportions versus ,

automatic cluster detection

153–154

agglomerative clustering, 368–370

discussed, 149

case study, 374–378

expected values, calculating, 150–151

categorical variables, 359

splits, decision trees, 180–183

centroid distance, 369

churn

complete linkage, 369

as binary outcome, 119

data preparation, 363–365

customer longevity, predicting,

dimension, 352

119–120

directed clustering, 372

EBCF (existing base churn

discussed, 12, 91, 351

forecast), 469

distance and similarity, 359–363

expected, 118

divisive clustering, 371–372

forced attrition, 118

evaluation, 372–373

importance of, 117–118

Gaussian mixture model, 366–367

involuntary, 118–119, 521

geometric distance, 360–361

recognizing, 116–117

hard clustering, 367

retention and, 116–120

Hertzsprung-Russell diagram,

voluntary, 118–119, 521

352–354

class labels, probability, 85

luminosity, 351

classification

scaling, 363–364

accuracy, 79

single linkage, 369

binary

soft clustering, 367

decision trees, 168

SOM (self-organizing map), 372

misclassification rates, 98

vectors, angles between, 361–362

business goals, formulating, 605

weighting, 363–365

child nodes, 167

zone boundaries, adjusting, 380

470643 bindex.qxd 3/8/04 11:08 AM Page 621

Index 621

business goals, formulating, 605

competitive advantage, information

customer attributes, 11

as, 14

data transformation, 57

complete linkage, automatic cluster

overview, 11

detection, 369

profiling tasks, 12

computational issues, customer

undirected data mining, 57

signatures, 594–596

coding, special-purpose code, 595

concentration

collaborative filtering

concentration charts, 101

estimated ratings, 284–285

cumulative response, 82–83

grouping customers, 90

confidence intervals

predictions, 284–285

hypothesis testing, 148

profiles, building and comparing,

statistical analysis, 146, 148–149

283–284

confusion

social information filtering, 282

aggregation and, 48

word-of-mouth advertising, 283

confusion matrix, 79

collections, credit risks, 114

data transformation, 28

columns, data

conjugate gradient, 230

cost, 548

constant hazards

derived variables, 542

changing over time hazards versus ,

discussed, 542

416–417

identification, 548

discussed, 397

ignored, 547

continuous variables

input, 547

data preparation, 235–237

with one value, 544–546

neural networks, 235–237

target, 547

statistics, 137–138

with unique values, 546–547

control group response

weight, 548

marketing campaigns, 106

combination function

target market response versus , 38

attrition history, 280

controlled experiments, hypothesis

MBR (memory-response reasoning),

testing, 51

258, 265

convenience users, behavior-based

neural networks, 222

variables, 580, 587–589

weighted voting, 281–282

cookies, Web servers, 109

commercial software products, 15

correct classification matrix, 79

communication channels,

correlation ranges, statistics, 139

prospecting, 89

costs

companies. See businesses

cost columns, 548

comparisons

decision tree considerations, 195

comparing models, using lift ratio,

countervailing errors, 81

81–82

counts, converting to proportions,

data, 83

75–76

statistical analysis, 148–149

coverage of values, neural networks,

competing risks, hazards, 403

232–233

Cox proportional hazards, 410–411

470643 bindex.qxd 3/8/04 11:08 AM Page 622

622 Index

creative process, data mining as, 33

stages, 457

credit

strategies for, 6

credit applications

stratification, 469

classification tasks, 9

subscription-based relationships,

prediction tasks, 10

459–460

useful data sources, 60

survival analysis, 413–415

credit risks, reducing exposure to,

transaction processing systems, 3–4

113–114

up-selling, 467

crossover, generic algorithms, 430

winback approach, 470

cross-selling opportunities

customer-centric businesses,

affinity grouping, 11

514–515, 516–521

customer relationships, 467

demographic profiles, 31

marketing campaigns, 111, 115–116

grouping, collaborative filtering

reasons for, 17

and, 90

cross-tabulations, 136, 567–568

interactions, learning opportunities,

cumulative gains, 36, 101

520–521

cumulative response

loyalty, 520

concentration, 82–83

marginal, 553

results, assessing, 85

new customer information

customers

gathering, 109–110

attributes, clustering, 11

memory-based reasoning, 277

behaviors of, gaining insight, 56

profiles, building, 283

TEAMFLY

customer relationships

prospective customer value, 115

bad customers, weeding out, 18

responses

building businesses around, 2

to marketing campaigns, 109

customer acquisition, 461–464

prediction, MBR, 258

customer activation, 464–466

retrospective customer value, 115

customer-centric enterprises, 3

segmentation, marketing campaigns,

data mining role in, 5–6

111–113

data warehousing, 4–5

sequential patterns, identifying, 24

deep intimacy, 449, 451

signatures

event-based relationships, 458–459

assembling, 68

good customers, holding on to,

business versus residential

17–18

customers, 561

in-between relationships, 453

columns, pivoting, 563

indirect relationships, 453–454

computational issues, 594–596

interests in, 13–14

considerations, 564

large-business relationships, 3–4

customer identification, 560–562

levels of, 448

data for, cataloging, 559–560

life stages, 455–456

discussed, 540–541

lifetime customer value, 32

model set creation, 68

mass intimacy, 451–453

snapshots, 562

retention, 467–469

time frames, identifying, 562

service business sectors, 13–14

single views, 517–518

small-business relationships, 2

Team-Fly®

470643 bindex.qxd 3/8/04 11:08 AM Page 623

Index 623

sorting, by scores, 8

discussed, 64

telecommunications, market based

distributions, examining, 65

analysis, 288

histograms, 565–566

cutoff scores, 98

intuition, 65

cyclic graphs, 330–331

question asking, 67–68

data marts, 485, 491–492

data selection

data

contents of, outcomes of interest, 64

acquisition-time, 108–110

data locations, 61–62

as actionable information, 516

density, 62–63

availability, determining, 515–516

history of, determining, 63

binary, 557

scarce data, 61–62

business versus scientific, statistical

variable combinations, 63–64

analysis, 159

data transformation

censored, 161

capture trends, 75

by census tract, 94

counts, converting to proportions,

central repository, 484, 488, 490

75–76

columns

discussed, 74

cost, 548

information technology and user

derived variables, 542

roles, 58–60

discussed, 542

problems, identifying, 56–57

identification, 548

ratios, 75

ignored, 547

results, deliverables, 58

input, 547

results, how to use, 57–58

with one value, 544–546

summarization, 44

target, 547

virtuous cycle, 28–30

with unique values, 546–547

dirty, 592–593

weight, 548

dumping, flat files, 594

comparisons, 83

enterprise-wide, 33

for customer signatures, cataloging,

ETL (extraction, transformation, and

559–560

load) tools, 487

data correction

gigabytes, 5

categorical variables, 73

as graphs, 337

encoding, inconsistent, 74

historical

missing values, 73–74

customer behaviors, 5

numeric variables, 73

MBR (memory-based reasoning),

outliners, 73

262–263

overview, 72

neural networks, 219

skewed distributions, 73

prediction tasks, 10

values with meaning, 74

house-hold level, 96

data exploration

imperfections in, 34

assumptions, validating, 67

inconsistent, 593–594

descriptions, comparing values

as information, 22

with, 65

metadata repository, 484, 491

470643 bindex.qxd 3/8/04 11:08 AM Page 624

624 Index

data (continued)

outsourcing, 522–524

missing data

platforms, 527

data correction, 73–74

scalability, 533–534

NULL values, 590

scoring platforms, 527–528

splits, decision trees, 174–175

staffing, 525–526

operational feedback, 485, 492

typical operational systems

patterns

versus , 33

meaningful discoveries, 56

undirected

prediction, 45

affinity grouping, 57

untruthful learning sources, 45–46

clustering, 57

point-of-sale

discussed, 7

association rules, 288

Data Preparation for Data Mining

scanners, 3

(Dorian Pyle), 75

as useful data source, 60

The Data Warehouse Toolkit (Ralph

preparation

Kimball), 474

automatic cluster detection,

data warehousing

363–365

customer patterns, 5

categorical values, neural networks,

for decision support, 13

239–240

discussed, 4

continuous values, neural

database administrators (DBAs), 488

networks, 235–237

databases

quality, association rules, 308

call detail, 37

representation, generic algorithms,

demographic, 37

432–433

KDD (knowledge discovery in

scarce, 62

databases), 8

source systems, 484, 486–487

server platforms, affordability, 13

SQL, time series analysis, 572–573

datasets, balanced, model sets, 68

terabytes, 5

dates and times, interval variables,

truncated, 162

551

useful data sources, 60–61

DBAs (database administrators), 488

visualization tools, 65

deaths, house-hold level data, 96

wrong level of detail, untruthful

debt, nonrepayment of, credit

learning sources, 47

risks, 114

data mining

decision support

architecture, 528–532

data warehousing for, 13

as creative process, 33

hypothesis testing, 50–51

directed

summary data, OLAP, 477–478

classification, 57

decision trees

discussed, 7

alphas, 188

estimation, 57

alternate representations for, 199–202

prediction, 57

applying to sequential events, 205

documentation, 536–537

branching nodes, 176

goals of, 7

building models, 8

insourcing, 524–525

case-study, 206, 208

470643 bindex.qxd 3/8/04 11:08 AM Page 625

Index 625

for catalog response models, 175

deep intimacy, customer relationships,

classification, 9, 166–168

449, 451

cost considerations, 195

default classes, records, 194

effectiveness of, measuring, 176

default risks, proof-of-concept

estimation, 170

projects, 599

as exploration tool, 203–204

degrees of freedom values, chi-square

fields, multiple, 195–197

tests, 152–153

neural networks, 199

democracy approach, memory-based

profiling tasks, 12

reasoning, 279–281

projective visualization, 207–208

demographic databases, 37

pruning

demographic profiles, customers, 31

C5 algorithm, 190–191