Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 151 – Library. Read online. Free books read online. Read books without registering

density

CART algorithm, 185, 188–189

data selection, 62–63

discussed, 184

density function, statistics, 133

minimum support pruning, 312

deploying models, 84–85

stability-based, 191–192

derived variables, column data, 542

rectangular regions, 197

descriptions

regression trees, 170

comparing values with, 65

rules, extracting, 193–194

data transformation, 57

SAS Enterprise Miner Tree Viewer

descriptive models, assessing, 78

tool, 167–168

descriptive profiling, 52

scoring, 169–170

deviation. See standard deviation

splits

difference of proportion

on categorical input variables, 174

chi-square tests versus , 153–154

chi-square testing, 180–183

statistical analysis, 143–144

discussed, 170

differential response analysis,

diversity measures, 177–178

marketing campaigns, 107–108

entropy, 179

differentiation, market based

finding, 172

analysis, 289

Gini splitting criterion, 178

dimension

information gain ratio, 178, 180

automatic cluster detection, 352

intrinsic information of, 180

dimension tables, OLAP, 502–503

missing values, 174–175

directed clustering, automatic cluster

multiway, 171

detection, 372

on numeric input variables, 173

directed data mining

population diversity, 178

classification, 57

purity measures, 177–178

discussed, 7

reduction in variance, 183

estimation, 57

surrogate, 175

prediction, 57

subtrees, selecting, 189

directed graphs, 330

uses for, 166

directed models, assessing, 78–79

declining usage, behavior-based

directed profiling, 52

variables, 577–579

dirty data, 592–593

470643 bindex.qxd 3/8/04 11:08 AM Page 626

626 Index

discrete outcomes, classification, 9

equal-height binning, 551

discrete values, statistics, 127–131

equal-width binning, 551

discrimination measures, ROC

erroneous conclusions, 74

curves, 99

errors

dissociation rules, 317

countervailing, 81–82

distance and similarity, automatic

error rates

cluster detection, 359–363

adjusted, 185

distance function

establishing, 79

defined, 271–272

measurement, 159

discussed, 258, 265

operational, 159

hidden distance fields, 278

predicting, 191

identity distance, 271

standard error of proportion,

numeric fields, 275

statistical analysis, 139–141

triangle inequality, 272

established customers, customer

zip codes, 276–277

relationships, 457

distribution

estimation

data exploration, 65

accuracy, 79–81

one-tailed, 134

averages, 81

probability and, 135

business goals, formulating, 605

statistics, 130–132

classification tasks, 9

two-tailed, 134

collaboration filtering, 284–285

diverse data types, 536

data transformation, 57

diversity measures, splitting criteria,

decision trees, 170

decision trees, 177–178

directed data mining, 57

divisive clustering, automatic cluster

estimation task examples, 10

detection, 371–372

examples of, 10

documentation

neural networks, 10, 215

data mining, 536–537

regression models, 10

historical data as, 61

revenue, behavior-based variables,

dumping data, flat files, 594

581–583

standard deviation, 81

valued outcomes, 9

EBCF (existing base churn

ETL (extraction, transformation,

forecast), 469

and load) tools, 487, 595

economic data, useful data sources, 61

evaluation, automatic cluster

edges, graphs, 322

detection, 372–373

education level, house-hold level

event-based relationships, customer

data, 96

relationships, 458–459

e-mail

existing base churn forecast

as communication channel, 89

(EBCF), 469

free text resources, 556–557

expectations

encoding, inconsistent, data

comparing to results, 31

correction, 74

expected values, chi-square tests,

enterprise-wide data, 33

150–151

entropy, information gain, 178–180

proof-of-concept projects, 599

470643 bindex.qxd 3/8/04 11:08 AM Page 627

Index 627

expected churn, 118

fraudulent insurance claims,

experimentation

classification, 9

hypothesis testing, 51

free text response, memory-based

statistics, 160–161

reasoning, 285

exploration tools, decision trees as,

functionality, lack of, data

203–204

transformation, 28

exponential decay, retention,

functions

389–390, 393

activation, 222

expressive power, descriptive

CHIDIST, 152

models, 78

combination

extraction, transformation, and load

attrition history, 280

(ETL) tools, 487, 595

MBR (memory-based reasoning),

258, 265

neural networks, 272

F tests (Ronald A. Fisher), 183–184

weighted voting, 281–282

fax machines, link analysis, 337–341

density, 133

Federal Express, transaction

distance

processing systems, 3–4

defined, 271–272

feedback

discussed, 258, 265

change processes, 34

hidden distance fields, 278

operational, 485, 492

identity distance, 271

relevance feedback, MBR, 267–268

numeric fields, 275

feed-forward neural networks

triangle inequality, 272

back propagation, 228–232

zip codes, 276–277

hidden layer, 227

hyperbolic tangent, 223

input layer, 226

NORMDIST, 134

output layer, 227

NORMSINV, 147

field values, statistics, 128

sigmoid, 225

Fisher, Ronald A. (F tests), 183–184

summation, 272

fixed budgets, marketing campaigns,

tangent, 223

97–100

transfer, 223

fixed positions, generic algorithms, 435

future attrition, 49

fixed-length character strings, 552–554

future customer behaviors,

flat files, dumping data, 594

predicting, 10

forced attrition, 118

forecasting

EBCF (existing base churn

gains, cumulative, 36, 101

forecast), 469

Gaussian mixture model, automatic

NSF (new start forecast), 469

cluster detection, 366–367

survival analysis, 415–416

gender

former customers, customer

as categorical value, 239

relationships, 457

profiling example, 12

forward-looking businesses, 2

generalized delta rules, 229

fraud detection, MBR, 258

470643 bindex.qxd 3/8/04 11:08 AM Page 628

628 Index

genetic algorithms

data as, 337

case study, 440–443

directed, 330

crossover, 430

edges, 322

data representation, 432–433

graph-coloring algorithm, 340–341

genome, 424

Hamiltonian path, 328

implicit parallelism, 438

linkage, 77

maximum values, of simple

nodes, 322

functions, 424

planar, 323

mutation, 431–432

traveling salesman problem, 327–329

neural networks and, 439–440

vertices, 322

optimization, 422

grouping. See clustering

overview, 421–422

GUI (graphical user interface), 535

resource optimization, 433–435

response modeling, 440–443

schemata, 434, 436–438

Hamiltonian path, graph theory, 328

selection step, 429

hard clustering, automatic cluster

statistical regression techniques, 423

detection, 367

Genetic Algorithms in Search,

hazards

Optimization, and Machine Learning

bathtub, 397–398

(Goldberg), 445

censoring, 399–403

geographic attributes, market based

constant, 397, 416–417

analysis, 293

probabilities, 394–396

geographic information system

proportional

(GIS), 536

Cox, 410–411

geographical resources, 555–556

discussed, 408

geometric distance, automatic cluster

examples of, 409

detection, 360–361

limitations of, 411–412

gigabytes, 5

real-world example, 398–399

Gini, Corrado (Gini splitting criterion,

retention, 404–405

decision trees), 178

stratification, 410

GIS (geographic information

Hertzsprung-Russell diagram,

system), 536

automatic cluster detection,

goals, formulating, 605–606

352–354

Goldberg (Genetic Algorithms in

hidden distance fields, distance

Search, Optimization, and Machine

function, 278

Learning), 445

hidden layer, feed-forward neural

good customers, holding on to, 17–18

networks, 221, 227

good prospects, identifying, 88–89

hierarchical categories, products, 305

Goodman, Marc (projective

histograms

visualization), 206–208

data exploration, 565–566

graphical user interface (GUI), 535

discussed, 543

graphs

statistics and, 127

acyclic, 331

historical data

cyclic, 330–331

customer behaviors, 5

documentation as, 61

470643 bindex.qxd 3/8/04 11:08 AM Page 629

Index 629

MBR (memory-based reasoning),

inconclusive survey responses, 46

262–263

inconsistent data, 593–594

neural networks, 219

index-based scores, 92–95

predication tasks, 10

indicator variables, 554

hobbies, house-hold level data, 96

indirect relationships, customer

holdout groups, marketing

relationships, 453–454

campaigns, 106

industry revolution, 18

home-based businesses, 56

inexplicable rules, association rules,

house-hold level data, 96

297–298

hubs, link analysis, 332–334

information

hyperbolic tangent function, 223

competitive advantages, 14

hypothesis testing

data as, 22

confidence levels, 148

infomediaries, 14

considerations, 51

information brokers, supermarket

decision-making process, 50–51

chains as, 15–16

generating, 51

information gain, entropy, 178–180

market basket analysis, 51

information technology, data

null hypothesis, statistics and,

transformation, 58–60

125–126

as products, 14

recommendation-based businesses,

16–17

IBM, relational database management

Inmon, Bill (Building the Data

software, 13

Warehouse), 474

ID and key variables, 554

input columns, 547

ID3 (Iteractive Dichotomiser 3), 190

input layer, free-forward neural

identification

networks, 226

columns, 548

input variables, target fields, 37

customer signatures, 560–562

inputs/outputs, neural networks, 215

good prospects, 88–89

insourcing data mining, 524–525

problem management, 43

insurance claims, classification, 9

proof-of-concept projects, 599–601

interactive systems, response times, 33

identified versus anonymous

Internet resources

transactions, association rules, 308

customer response to marketing

identity distance, distance function, 271

campaigns, tracking, 109

ignored columns, 547

RuleQuest, 190

images, binary data, 557

U.S. Census Bureau, 94

imperfections, in data, 34

interval variables, 549, 552

implementation

interviews

neural networks, 212

business opportunities,

proof-of-concept projects, 601–605

identifying, 27

implicit parallelism, 438

proof-of-concept projects, 600

in-between relationships, customer

intrinsic information, splits, decision

relationships, 453

trees, 180

income, house-hold level data, 96

introduction, of products, 27

470643 bindex.qxd 3/8/04 11:08 AM Page 630

630 Index

intuition, data exploration, 65

case study, 343–346

involuntary churn, 118–119, 521

classification, 9

item popularity, market based

discussed, 321

analysis, 293

fax machines, 337–341

item sets, market based analysis, 289

graphs

Iterative Dichotomiser 3 (ID3), 190

acyclic graphs, 331

communities of interest, 346

cyclic, 330–331

key and ID variables, 554

data as, 340

KDD (knowledge discovery in

directed graphs, 330

databases), 8

edges, 322

Kimball, Ralph (The Data Warehouse

graph-coloring algorithm, 340–341

Toolkit), 474

Hamiltonian path, 328

Kleinberg algorithm, link analysis,

nodes, 322

332–333

planar graphs, 323

K-means clustering, 354–358

traveling salesman problem,

knowledge discovery in databases

327–329

(KDD), 8

vertices, 322

Kolmogorov-Smirnov (KS) tests, 101

hubs, 332–334

Kleinberg algorithm, 332–333

root sets, 333

large-business relationships, customer

search programs, 331

relationship management, 3–4

stemming, 333

leaf nodes, classification, 167

weighted graphs, 322, 324

learning

linkage graphs, 77

opportunities, customer interactions,

lists, ordered and unordered, 239

520–521

literature, market research, 22

supervised, 57

logarithms, data transformation, 74

training techniques as, 231

logical schema, OLAP, 478

truthful sources, 48–50

logistic methods, box diagrams, 200

unsupervised, 57

long form, census data, 94

untruthful sources, 44–48

long-term trends, 75

life stages, customer relationships,

lookup tables, auxiliary information,

455–456

570–571

lifetime customer value, customer

loyalty

relationships, 32

customers, 520

lift ratio

loyalty programs

comparing models using, 81–82

marketing campaigns, 111

lift charts, 82, 84

welcome periods, 518

problems with, 83

luminosity, 351

linear processes, 55

linear regression, 139

link analysis

mailings

authorities, 333–334

marketing campaigns, 97

candidates, 333

non-response models, 35

470643 bindex.qxd 3/8/04 11:08 AM Page 631

Index 631

marginal customers, 553

as statistical analysis