Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

market based analysis

acuity of testing, 147–148

differentiation, 289

confidence intervals, 146

discussed, 287

proportion, standard error of,

geographic attributes, 293

139–141

item popularity, 293

results, comparing, using confiitem sets, 289

dence bounds, 141–143

market basket data, 51, 289–291

sample sizes, 145

marketing interventions, tracking,

targeted acquisition campaigns, 31

293–294

types of, 111

order characteristics, 292

up-selling, 115–116

products, clustering by usage,

usage stimulation, 111

294–295

marriages

purchases, 289

categorical values, 239–240

support, 301

house-hold level data, 96

telecommunications customers, 288

mass intimacy, customer relationships,

time attributes, 293

451–453

market research

massively parallel processor

control group response versus , 38

(MPP), 485

literature, 22

maximum values, of simple functions,

shortcomings, 25

generic algorithms, 424

survey-based, 113

MBR. See memory-based reasoning

marketing campaigns. See also

MDL (minimum description

advertising

length), 78

acquisitions-time data, 108–110

mean between time failure

canonical measurements, 31

(MTBF), 384

champion-challenger approach, 139

mean time to failure (MTTF), 384

credit risks, reducing exposure to,

mean values, statistics, 137

113–114

measurement errors, 159

cross-selling, 115–116

median customer lifetime value,

customer response, tracking, 109

retention, 387

customer segmentation, 111–113

median values, statistics, 137

differential response analysis,

medical insurance claims, useful

107–108

data sources, 60

discussed, 95

medical treatment applications,

fixed budgets, 97–100

MBR, 258

loyalty programs, 111

meetings, brainstorming, 37

new customer information,

memory-based reasoning (MBR)

gathering, 109–110

case study, 259–262

people most influenced by, 106–107

challenges of, 262–265

planning, 27

classification codes, 266, 273–274

profitability, 100–104

combination function, 258, 265

proof-of-concept projects, 600

customer classification, 90–91

response modeling, 96–97

customer response prediction, 258

470643 bindex.qxd 3/8/04 11:08 AM Page 632

632 Index

memory-based reasoning (MBR)

missing data

(continued)

data correction, 73–74

democracy approach, 279–281

NULL values, 590

distance function, 258, 265, 271–272

splits, decision trees, 174–175

fraud detection, 258

mission-critical applications, 32

free text response, 258

mode values, statistics, 137

historical records, selecting, 262–263

models

medical treatment applications, 258

assessing

new customers, 277

classifiers and predictors, 79

relevance feedback, 267–268

descriptive models, 78

similarity measurements, 271–272

directed models, 78–79

training data, 263–264

estimators, 79–81

weighted voting, 281–282

building, 8, 77

men, differential response analysis

comparing, using lift ratio, 81–82

and, 107

deploying, 84–85

messages, prospecting, 89–90

model sets

metadata repository, 484, 491

balanced datasets, 68

methodologies

components of, 52

data correction, 72–74

customer signatures, assembling, 68

data exploration, 64–68

partitioning, 71–72

data mining process, 54–55

predictive models, 70–71

data selection, 60–64

timelines, multiple, 70

TEAMFLY

data transformation, 74–76

non-response, mass mailings, 35

data translation, 56–60

score sets, 52

learning sources

motor vehicle registration records,

truthful, 48–50

useful data sources, 61

untruthful, 44–48

MOU (minutes of use), wireless

model assessment, 78–82

communications industries, 38

model building, 77

MPP (massively parallel processor), 485

model deployment, 84–85

MSA (metropolitan statistical area), 94

model sets, creating, 68–72

MTBF (mean between time failure), 384

reasons for, 44

MTTF (mean time to failure), 384

results, assessing, 85

multiway splits, decision trees, 171

metropolitan statistical area (MSA), 94

mutation, generic algorithms, 431–432

minimum description length

(MDL), 78

N

minimum support pruning, decision

N variables, dimension, 352

trees, 312

National Consumer Assets Group

minutes of use (MOU), wireless

(NCAG), 23

communications industries, 38

natural association, automatic cluster

misclassification rates, binary

detection, 358

classification, 98

Team-Fly®

470643 bindex.qxd 3/8/04 11:08 AM Page 633

Index 633

nearest neighbor techniques

classification, 9

classification, 9

combination function, 222

collaborative filtering

components of, 220–221

estimated ratings, 284–285

continuous values, features with,

grouping customers, 90

235–237

predictions, 284–285

coverage of values, 232–233

profiles, building and comparing,

data preparation

283–284

categorical values, 239–240

social information filtering, 282

continuous values, 235–237

word-of-mouth advertising, 283

decision trees, 199

memory-based reasoning (MBR)

discussed, 211

case study, 259–262

estimation tasks, 10, 215

challenges of, 262–265

feed-forward

classification codes, 266, 273–274

back propagation, 228–232

combination function, 258, 265

hidden layer, 227

customer classification, 90–91

input layer, 226

customer response prediction, 258

output layer, 227

democracy approach, 279–281

generic algorithms and, 439–440

distance function

hidden layers, 221, 227

fraud detection, 258

historical data, 219

free text responses, 258

history of, 212–213

historical records, selecting,

implementation, 212

262–263

inputs/outputs, 215

medical treatment applications, 258

neighborliness parameters, 250

new customers, 277

nonlinear behaviors, 222

relevance feedback, 267–268

OR value, 222

similarity measurements, 271–272

overfitting, 234

training data, 263–264

parallel coordinates, 253

weighted voting, 281–282

prediction, 215

negative correlation, 139

real estate appraisal example,

neighborliness parameters, neural

213–217

networks, 250

results, interpreting, 241–243

neural networks

sensitivity analysis, 247–248

activation function, 222

sigmoid action functions, 225

AND value, 222

SOM (self-organizing map), 249–251

automation, 213

time series analysis, 244–247

average member technique, 252

training sets, selection consideration,

bias sampling, 227

232–234

biological, 211

transfer function, 223

building models, 8

validation sets, 218

case study, 252–254

variable selection problem, 233

categorical variables, 239–240

variance, 199

470643 bindex.qxd 3/8/04 11:08 AM Page 634

634 Index

new customer information

Open Database Connectivity

gathering, 109–110

(ODBC), 496

memory-based reasoning, 277

operational errors, 159

profiles, building, 283

operational feedback, 485, 492

new start forecast (NSF), 469

operational summary data, OLAP, 477

nodes, graphs, 322

opportunistic sample, defined, 25

nonlinear behaviors, neural

opportunities, good response

networks, 222

scores, 34

non-response models, mass

optimization

mailings, 35

generic algorithms, 422

normal distribution, statistics, 130–132

resources, generic algorithms,

normalization, numeric variables, 550

433–435

normalized absolute value, distance

training as, 230

function, 275

OR value, neural networks, 222

NORMDIST function, 134

Oracle, relational database

NORMSINV function, 147

management software, 13

NSF (new start forecast), 469

order characteristics, market based

null hypothesis, statistics and, 125–126

analysis, 292

NULL values, missing data, 590

ordered lists, 239

numeric variables

ordered variables, measure of, 549

data correction, 73

organizations. See businesses

distance function, 275

out of time tests, 72

measure of, 550–551

outliners

splits, decision trees, 173

data correction, 73

data transformation, 74

O

output layer, feed-forward neural

Occam’s Razor, 124–125

networks, 227

ODBC (Open Database

outputs, neural networks, 215

Connectivity), 496

outsourcing data mining, 522–524

one-tailed distribution, 134

overfitting, neural networks, 234

Online Analytic Processing (OLAP)

additive facts, 501

P

data mining and, 507–508

parallel coordinates, neural

decision-support summary data,

networks, 253

477–478

parsing variables, 569

dimension tables, 502–503

patterns

discussed, 31

meaningful discoveries, 56

levels of, 475

prediction, 45

logical schema, 478

untruthful learning sources, 45–46

metadata, 483–484, 491

peg values, 236

operational summary data, 477

penetration, proportion, 203

physical schema, 478

percent variations, 105

reporting requirements, 495–496

perceptrons, defined, 212

transaction data, 476–477

470643 bindex.qxd 3/8/04 11:08 AM Page 635

Index 635

performance, classification, 12

distribution and, 135

physical schema, OLAP, 478

hazards, 394–396

pilot projects, 598

statistics, 133–135

planar graphs, 323

probation periods, 518

planned processes, proof-of-concept

problem management

projects, 599

data transformation, 56–57

platforms, data mining, 527

identification, 43

point of maximum benefit, 101

lift ratio, 83

point-of-sale data

profiling as, 53–54

association rules, 288

rule-oriented problems, 176

scanners, 3

variable selection problems, neural

as useful data source, 60

networks, 233

population diversity, 178

products

positive ratings, voting, 284

clustering by usage, market based

postcards, as communication

analysis, 294–295

channel, 89

co-occurrence of, 299

potential revenue, behavior-based

hierarchical categories, 305

variables, 583–585

information as, 14

precision measurements, classification

introduction, planning for, 27

codes, 273–274

product codes, as categorical

preclassified tests, 79

value, 239

predictions

product-focused businesses, 2

accuracy, 79

taxonomy, 305

association rules, 70

profiling

business goals, formulating, 605

business goals, formulating, 605

collaborative filtering, 284–285

collaborative filtering, 283–284

credit risks, 113–114

data transformation, 57

customer longevity, 119–120

decision trees, 12

data transformation, 57

demographic profiles, 31

defined, 52

descriptive, 52

directed data mining, 57

directed, 52

errors, 191

examples of, 54

future behaviors, 10

gender example, 12

historical data, 10

new customer information, 283

model sets for, 70–71

overview, 12

neural networks, 215

predication versus , 52–53

patterns, 45

as problem management, 53–54

prediction task examples, 10

survey response, 53

profiling versus , 52–53

profitability

response, MBR, 258

marketing campaigns, 100–104

uses for, 54

proof-of concept projects, 599

probabilities

results, assessing, 85

calculating, 309

projective visualization (Marc

class labels, 85

Goodman), 206–208

470643 bindex.qxd 3/8/04 11:08 AM Page 636

636 Index

proof-of-concept projects

planning, 27

expectations, 599

profitability, 100–104

identifying, 599–601

response modeling, 96–97

implementation, 601–605

types of, 111

propensity

up-selling, 115–116

categorical variables, 242

messages, selecting appropriate,

propensity-to-respond score, 97

89–90

proportion

ranking, 88–89

converting counts to, 75–76

roles in, 88

difference of proportion

targeting, 88

chi-square tests versus , 153–154

time dependency and, 160

statistical analysis, 143–144

prospective customer value, 115

penetration, 203

prototypes, proof-of-concept

standard error of, statistical analysis,

projects, 599

139–141

pruning, decision trees

proportional hazards

C5 algorithm, 190–191

Cox, 410–411

CART algorithm, 185, 188–189

discussed, 408

discussed, 184

examples of, 409

minimum support pruning, 312

limitations of, 411–412

stability-based, 191–192

proportional scoring, census data,

public records, house-hold level

94–95

data, 96

prospecting

publications

advertising techniques, 90–94

Building the Data Warehouse (Bill

communication channels, 89

Inmon), 474

customer relationships, 457

Business Modeling and Data Mining

efforts, 90

(Dorian Pyle), 60

good prospects, identifying, 88–89

Data Preparation for Data Mining

index-based scores, 92–95

(Dorian Pyle), 75

marketing campaigns

The Data Warehouse Toolkit (Ralph

acquisition-time variables, 110

Kimball), 474

credit risks, reducing exposure to,

Genetic Algorithms in Search,

113–114

Optimization, and Machine

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *