Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

cross-selling, 115–116

Learning (Goldberg), 445

customer response, tracking, 109

purchases, market based analysis, 289

customer segmentation, 111–113

purchasing frequencies, behavior-

differential response analysis,

based variables, 575–576

107–108

purity measures, splitting criteria,

discussed, 95

decision trees, 177–178

fixed budgets, 97–100

p-values, statistics, 126

new customer information,

Pyle, Dorian

gathering, 109–110

Business Modeling and Data Mining, 60

people most influenced by, 106–107

Data Preparation for Data Mining, 75

470643 bindex.qxd 3/8/04 11:08 AM Page 637

Index 637

Q

relational database management

quadratic discriminates, box

system (RDBMS)

diagrams, 200

discussed, 474

quality of data, association rules, 308

source systems, 594–595

question asking, data exploration,

star schema, 505

67–68

suppliers, 13

Quinlan, J. Ross (Iterative

support, 511

Dichotomiser 3), 190

relevance feedback, MBR, 267–268

q-values, statistics, 126

replicating results, 33

reporting requirements, OLAP,

R

495–496

range values, statistics, 137

resources

rate plans, wireless telephone

geographical, 555–556

services, 7

optimization, generic algorithms,

ratios

433–435

data transformation, 75

response

lift ratio, 81–84

biased sampling, 146

RDBMS. See relational database

communication channels, 89

management system

control groups

real estate appraisals, neural network

market research versus , 38

example, 213–217

marketing campaigns, 106

recall measurements, classification

cumulative response

codes, 273–274

concentration, 82–83

recency, frequency, and monetary

results, assessing, 85

(RFM) value, 575

customer relationships, 457

recommendation-based businesses,

differential response analysis,

16–17

marketing campaigns, 107–108

records

erroneous conclusions, 74

combining values within, 569

free text, 285

default classes, 194

good response scores, 34

transactional, 574

marketing campaigns, 96–97

rectangular regions, decision trees, 197

prediction, MBR, 258

recursive algorithms, 173

proof-of-concept projects, 599

reduction in variance, splits, decision

response models

trees, 183

generic algorithms, 440–443

regression

prospects, ranking, 36

building models, 8

response times, interactive

estimation tasks, 10

systems, 33

linear, 139

sample sizes, 145

regression trees, 170

single response rates, 141

statistics, 139

survey response

techniques, generic algorithms, 423

customer classification, 91

inconclusive, 46

470643 bindex.qxd 3/8/04 11:08 AM Page 638

638 Index

response, survey response (continued)

data quality, 308

profiling, 53

dissociation rules, 317

survey-based market research, 113

effectiveness of, 299–301

useful data sources, 61

inexplicable rules, 297–298

results

point-of-sale data, 288

actionable, 22

practical limits, overcoming,

assessing, 85

311–313

comparing expectations to, 31

prediction, 70

deliverables, data transformation,

probabilities, calculating, 309

57–58

products, hierarchical categories, 305

measuring, virtuous cycle, 30–32

sequential analysis, 318–319

neural networks, 241–243

for store comparisons, 315–316

replicating, 33

trivial rules, 297

statistical analysis, 141–143

virtual items, 307

tainted, 72

decision trees, 193–194

retention

generalized delta, 229

calculating, 385–386

rule-oriented problems, 176

churn and, 116–120

customer relationships, 467–469

S

exponential decay, 389–390, 393

SAC (Simplifying Assumptions

hazards, 404–405

Corporation), 97, 100

median customer lifetime value, 387

sample sizes, statistical analysis, 145

retention curves, 386–389

sample variation, statistics, 129

truncated mean lifetime value, 389

SAS Enterprise Miner Tree Viewer

retrospective customer value, 115

tool, 167–168

revenue, behavior-based variables,

scalability, data mining, 533–534

581–585

scaling, automatic cluster detection,

revolvers, behavior-based

363–364

variables, 580

scanners, point-of-sale, 3

RFM (recency, frequency, and

scarce data, 62

monetary) value, 575

SCF (sectional center facility), 553

ring diagrams, as alternative to

schemata, generic algorithms, 434,

decision trees, 199–201

436–438

risks

scores

hazards, 403

bizocity, 112–113

proof-of-concept projects, 599

cutoff, 98

ROC curves, 98–99, 101

decision trees, 169–170

root sets, link analysis, 333

good response, 34

RuleQuest Web site, 190

index-based, 92–95

rules

model deployment, 84–85

association rules

propensity-to-respond, 97

actionable rules, 296

proportional, census data, 94–95

affinity grouping, 11

score sets, 52

anonymous versus identified

scoring platforms, data mining,

transactions, 308

527–528

470643 bindex.qxd 3/8/04 11:08 AM Page 639

Index 639

sorting customers by, 8

simulated annealing, 230

z-scores, 551

single linkage, automatic cluster

search programs, link analysis, 331

detection, 369

searchable criteria, relevance

single response rates, 141

feedback, 268

single views, customers, 517–518

sectional center facility (SCF), 553

sites. See Web sites

selection step, generic algorithms, 429

skewed distributions, data

self-organizing map (SOM), 249–251,

correction, 73

372

SKUs (stock-keeping units), 305

sensitivity analysis, neural networks,

small-business relationships, customer

247–248

relationship management, 2

sequential analysis, association rules,

SMP (symmetric multiprocessor), 485

318–319

snapshots, customer signatures, 562

sequential events, applying decision

social information filtering, 282

trees to, 205

soft clustering, automatic cluster

sequential patterns, identifying, 24

detection, 367

server platforms, affordability, 13

SOI (sphere of influence), 38

service business sectors, customer

sole proprietors, 3

relationships, 13–14

solicitation, marketing campaigns, 96

shared labels, fax machines, 341

SOM (self-organizing map),

short form, census data, 94

249–251, 372

short-term trends, 75

source systems, 484, 486–487, 594

sigmoid action functions, neural

special-purpose code, 595

networks, 225

sphere of influence (SOI), 38

signatures, customers

spiders, web crawlers, 331

assembling, 68

splits, decision trees

business versus residential

on categorical input variables, 174

customers, 561

chi-square testing, 180–183

columns, pivoting, 563

discussed, 170

computational issues, 594–596

diversity measures, 177–178

considerations, 564

entropy, 179

customer identification, 560–562

finding, 172

data for, cataloging, 559–560

Gini splitting criterion, 178

discussed, 540–541

information gain ratio, 178, 180

model set creation, 68

intrinsic information of, 180

snapshots, 562

missing values, 174–175

time frames, identifying, 562

multiway, 171

similarity and distance, automatic

on numeric input variables, 173

cluster detection, 359–363

population diversity, 178

similarity matrix, 368

purity measures, 177–178

similarity measurements, MBR,

reduction in variance, 183

271–272

surrogate, 175

Simplifying Assumptions Corporation

spreadsheets, results, assessing, 85

(SAC), 97, 100

470643 bindex.qxd 3/8/04 11:08 AM Page 640

640 Index

SQL data, time series analysis,

mean values, 137

572–573

median values, 137

stability-based pruning, decision trees,

mode values, 137

191–192

multiple comparisons, 148–149

staffing, data mining, 525–526

normal distribution, 130–132

standard deviation

null hypothesis and, 125–126

estimation, 81

probabilities, 133–135

statistics, 132, 138

p-values, 126

variance and, 138

q-values, 126

standard error of proportion,

range values, 137

statistical analysis, 139–141

regression ranges, 139

standardization, numeric values, 551

sample variation, 129

standardized values, statistics,

standard deviation, 132, 138

129–133

standardized values, 129–133

star schema structure, relational

sum of values, 137–138

databases, 505

time series analysis, 128–129

statistical analysis

truncated data, 162

business data versus scientific

variance, 138

data, 159

z-values, 131, 138

censored data, 161

statistical regression techniques,

Central Limit Theorem, 129–130

generic algorithms, 423

chi-square tests

status codes, as categorical value, 239

case study, 155–158

stemming, link analysis, 333

degrees of freedom values,

stock-keeping units (SKUs), 305

chi-square tests, 152–153

store comparisons, association rules

difference of proportions versus ,

for, 315–316

153–154

stratification

discussed, 149

customer relationships and, 469

expected values, calculating,

hazards, 410

150–151

strings, fixed-length characters,

continuous variables, 137–138

552–554

correlation ranges, 139

subgroups

cross-tabulations, 136

automatic cluster detection

density function, 133

agglomerative clustering, 368–370

as disciplinary technique, 123

case study, 374–378

discrete values, 127–131

categorical variables, 359

experimentation, 160–161

centroid distance, 369

field values, 128

complete linkage, 369

histograms and, 127

data preparation, 363–365

marketing campaign approaches

dimension, 352

acuity of testing, 147–148

directed clustering, 372

confidence intervals, 146

discussed, 12, 91, 351

proportion, standard error of,

distance and similarity, 359–363

139–141

divisive clustering, 371–372

sample sizes, 145

evaluation, 372–373

470643 bindex.qxd 3/8/04 11:08 AM Page 641

Index 641

Gaussian mixture model, 366–367

T

geometric distance, 360–361

tables, lookup, auxiliary information,

hard clustering, 367

570–571

Hertzsprung-Russell diagram,

tainted results, 72

352–354

tangent function, 223

luminosity, 351

target columns, 547

scaling, 363–364

target fields, input variables, 37

single linkage, 369

target market versus control group

soft clustering, 367

response, 38

SOM (self-organizing map), 372

targeted acquisition campaigns, 31

vectors, angles between, 361–362

targeting

weighting, 363–365

good prospects, identifying, 88–89

zone boundaries, adjusting, 380

prospecting, 88

business goals, formulating, 605

taxonomy, products, 305

customer attributes, 11

telecommunications customers,

data transformation, 57

market based analysis, 288

overview, 11

telephone switches, transaction

profiling tasks, 12

processing systems, 3

undirected data mining, 57

terabytes, 5

subscription-based relationships, cus­

Teradata, relational database

tomer relationships, 459–460

management software, 13

subtrees, decision trees, 189

termination of services, 114

sum of values, statistics, 137–138

testing

summarization, data transformation, 44

acuity of, statistical analysis, 147–148

summation function, 272

chi-square tests

supermarket chains, as information

case study, 155–158

brokers, 15–16

CHIDIST function, 152

supervised learning, 57

degrees of freedom values, 152–153

support, market based analysis, 301

difference of proportions versus ,

surrogate splits, decision trees, 175

153–154

survey responses

discussed, 149

customer classification, 91

expected values, calculating,

inconclusive, 46

150–151

profiling, 53

splits, decision trees, 180–183

survey-based market research, 113

F tests, 183–184

useful data sources, 61

hypothesis testing

survival analysis

confidence levels, 148

attrition, handling different types of,

considerations, 51

412–413

decision-making process, 50–51

customer relationships, 413–415

generating, 51

estimation tasks, 10

market basket analysis, 51

forecasting, 415–416

null hypothesis, statistics and,

symmetric multiprocessor (SMP),

125–126

489–490

470643 bindex.qxd 3/8/04 11:08 AM Page 642

642 Index

testing (continued)

truncated mean lifetime value,

KS (Kolmogorov-Smirnov) tests, 101

retention, 389

preclassified tests, 79

truthful learning sources, 48–50

test groups, marketing

two-tailed distribution, 134

campaigns, 106

test sets

U

out of time tests, 72

undirected data mining

uses for, 52

affinity grouping, 57

time

clustering, 57

attributes, market based

discussed, 7

analysis, 293

uniform distribution, statistics, 132

and dates, interval variables, 551

uniform product code (UPC), 555

dependency, prospecting and, 160

UNIT_MASTER file, customer

frames, customer signatures, 562

signatures, 559

series analysis

unordered lists, 239

neural networks, 244–247

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *