Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 3 – Library. Read online. Free books read online. Read books without registering

470643 ftoc.qxd 3/8/04 11:33 AM Page xi

Contents

How Does a Neural Network Learn Using

Back Propagation?

228

Heuristics for Using Feed-Forward,

Back Propagation Networks

231

Choosing the Training Set

232

Coverage of Values for All Features

232

Number of Features

233

Size of Training Set

234

Number of Outputs

234

Preparing the Data

235

Features with Continuous Values

235

Features with Ordered, Discrete (Integer) Values

238

Features with Categorical Values

239

Other Types of Features

241

Interpreting the Results

241

Neural Networks for Time Series

244

How to Know What Is Going on Inside a Neural Network

247

Self-Organizing Maps

249

What Is a Self-Organizing Map?

249

Example: Finding Clusters

252

Lessons Learned

254

Chapter 8

Nearest Neighbor Approaches: Memory-Based

Reasoning and Collaborative Filtering

257

Memory Based Reasoning

258

Example: Using MBR to Estimate Rents in Tuxedo, New York

259

Challenges of MBR

262

Choosing a Balanced Set of Historical Records

262

Representing the Training Data

263

Determining the Distance Function, Combination

Function, and Number of Neighbors

265

Case Study: Classifying News Stories

265

What Are the Codes?

266

Applying MBR

267

Choosing the Training Set

267

Choosing the Distance Function

267

Choosing the Combination Function

267

Choosing the Number of Neighbors

270

The Results

270

Measuring Distance

271

What Is a Distance Function?

271

Building a Distance Function One Field at a Time

274

Distance Functions for Other Data Types

277

When a Distance Metric Already Exists

278

The Combination Function: Asking the Neighbors

for the Answer

279

The Basic Approach: Democracy

279

Weighted Voting

281

470643 ftoc.qxd 3/8/04 11:33 AM Page xii

xii

Contents

Collaborative Filtering: A Nearest Neighbor Approach to

Making Recommendations

282

Building Profiles

283

Comparing Profiles

284

Making Predictions

284

Lessons Learned

285

Chapter 9

Market Basket Analysis and Association Rules

287

Defining Market Basket Analysis

289

Three Levels of Market Basket Data

289

Order Characteristics

292

Item Popularity

293

Tracking Marketing Interventions

293

Clustering Products by Usage

294

Association Rules

296

Actionable Rules

296

Trivial Rules

297

Inexplicable Rules

297

How Good Is an Association Rule?

299

Building Association Rules

302

Choosing the Right Set of Items

303

Product Hierarchies Help to Generalize Items

305

Virtual Items Go beyond the Product Hierarchy

307

Data Quality

308

Anonymous versus Identified

308

Generating Rules from All This Data

308

Calculating Confidence

309

Calculating Lift

310

The Negative Rule

311

Overcoming Practical Limits

311

The Problem of Big Data

313

Extending the Ideas

315

Using Association Rules to Compare Stores

315

Dissociation Rules

317

Sequential Analysis Using Association Rules

318

Lessons Learned

319

Chapter 10 Link Analysis

321

Basic Graph Theory

322

Seven Bridges of Königsberg

325

Traveling Salesman Problem

327

Directed Graphs

330

Detecting Cycles in a Graph

330

A Familiar Application of Link Analysis

331

The Kleinberg Algorithm

332

The Details: Finding Hubs and Authorities

333

Creating the Root Set

333

Identifying the Candidates

334

Ranking Hubs and Authorities

334

Hubs and Authorities in Practice

336

470643 ftoc.qxd 3/8/04 11:33 AM Page xiii

Contents xiii

Case Study: Who Is Using Fax Machines from Home?

336

Why Finding Fax Machines Is Useful

336

The Data as a Graph

337

The Approach

338

Some Results

340

Case Study: Segmenting Cellular Telephone Customers

343

The Data

343

Analyses without Graph Theory

343

A Comparison of Two Customers

344

The Power of Link Analysis

345

Lessons Learned

346

Chapter 11 Automatic Cluster Detection

349

Searching for Islands of Simplicity

350

Star Light, Star Bright

351

Fitting the Troops

352

K-Means Clustering

354

Three Steps of the K-Means Algorithm

354

What K Means

356

Similarity and Distance

358

Similarity Measures and Variable Type

359

Formal Measures of Similarity

360

Geometric Distance between Two Points

360

Angle between Two Vectors

361

Manhattan Distance

363

Number of Features in Common

363

Data Preparation for Clustering

363

Scaling for Consistency

363

Use Weights to Encode Outside Information

365

Other Approaches to Cluster Detection

365

Gaussian Mixture Models

365

Agglomerative Clustering

368

An Agglomerative Clustering Algorithm

368

Distance between Clusters

368

Clusters and Trees

370

Clustering People by Age: An Example of

Agglomerative Clustering

370

Divisive Clustering

371

Self-Organizing Maps

372

Evaluating Clusters

372

Inside the Cluster

373

Outside the Cluster

373

Case Study: Clustering Towns

374

Creating Town Signatures

374

The Data

375

Creating Clusters

377

Determining the Right Number of Clusters

377

Using Thematic Clusters to Adjust Zone Boundaries

380

Lessons Learned

381

470643 ftoc.qxd 3/8/04 11:33 AM Page xiv

xiv

Contents

Chapter 12 Knowing When to Worry: Hazard Functions and

Survival Analysis in Marketing

383

Customer Retention

385

Calculating Retention

385

What a Retention Curve Reveals

386

Finding the Average Tenure from a Retention Curve

387

Looking at Retention as Decay

389

Hazards 394

The Basic Idea

394

Examples of Hazard Functions

397

Constant Hazard

397

Bathtub Hazard

397

A Real-World Example

398

Censoring 399

Other Types of Censoring

402

From Hazards to Survival

404

Retention 404

Survival 405

Proportional Hazards

408

Examples of Proportional Hazards

409

Stratification: Measuring Initial Effects on Survival

410

Cox Proportional Hazards

410

Limitations of Proportional Hazards

411

Survival Analysis in Practice

412

Handling Different Types of Attrition

412

When Will a Customer Come Back?

413

Forecasting 415

Hazards Changing over Time

416

Lessons Learned

418

Chapter 13 Genetic Algorithms

421

How They Work

423

Genetics on Computers

424

Selection 429

Crossover 430

Mutation 431

Representing Data

432

Case Study: Using Genetic Algorithms for

Resource Optimization

433

Schemata: Why Genetic Algorithms Work

435

More Applications of Genetic Algorithms

438

Application to Neural Networks

439

Case Study: Evolving a Solution for Response Modeling

440

Business Context

440

Data 441

The Data Mining Task: Evolving a Solution

442

Beyond the Simple Algorithm

444

Lessons Learned

446

470643 ftoc.qxd 3/8/04 11:33 AM Page xv

Contents

Chapter 14 Data Mining throughout the Customer Life Cycle

447

Levels of the Customer Relationship

448

Deep Intimacy

449

Mass Intimacy

451

In-between Relationships

453

Indirect Relationships

453

Customer Life Cycle

454

The Customer’s Life Cycle: Life Stages

455

Customer Life Cycle

456

Subscription Relationships versus Event-Based Relationships

458

Event-Based Relationships

458

Subscription-Based Relationships

459

Business Processes Are Organized around

the Customer Life Cycle

461

Customer Acquisition

461

Who Are the Prospects?

462

When Is a Customer Acquired?

462

What Is the Role of Data Mining?

464

Customer Activation

464

Relationship Management

466

Retention 467

Winback 470

Lessons Learned

470

Chapter 15 Data Warehousing, OLAP, and Data Mining

473

The Architecture of Data

475

Transaction Data, the Base Level

476

Operational Summary Data

477

Decision-Support Summary Data

477

Database Schema

478

Metadata 483

Business Rules

484

A General Architecture for Data Warehousing

484

Source Systems

486

Extraction, Transformation, and Load

487

Central Repository

488

Metadata Repository

491

Data Marts

491

Operational Feedback

492

End Users and Desktop Tools

492

Analysts 492

Application Developers

493

Business Users

494

Where Does OLAP Fit In?

494

What’s in a Cube?

497

Three Varieties of Cubes

498

Facts 501

Dimensions and Their Hierarchies

502

Conformed Dimensions

504

470643 ftoc.qxd 3/8/04 11:33 AM Page xvi

xvi

Contents

Star Schema

505

OLAP and Data Mining

507

Where Data Mining Fits in with Data Warehousing

508

Lots of Data

509

Consistent, Clean Data

510

Hypothesis Testing and Measurement

510

Scalable Hardware and RDBMS Support

511

Lessons Learned

511

Chapter 16 Building the Data Mining Environment

513

A Customer-Centric Organization

514

An Ideal Data Mining Environment

515

The Power to Determine What Data Is Available

515

The Skills to Turn Data into Actionable Information

516

All the Necessary Tools

516

Back to Reality

516

Building a Customer-Centric Organization

516

Creating a Single Customer View

517

Defining Customer-Centric Metrics

519

Collecting the Right Data

520

From Customer Interactions to Learning Opportunities

520

Mining Customer Data

521

The Data Mining Group

521

Outsourcing Data Mining

522

Outsourcing Occasional Modeling

522

Outsourcing Ongoing Data Mining

523

Insourcing Data Mining

524

Building an Interdisciplinary Data Mining Group

524

Building a Data Mining Group in IT

524

Building a Data Mining Group in the Business Units

525

What to Look for in Data Mining Staff

525

Data Mining Infrastructure

526

The Mining Platform

527

The Scoring Platform

527

One Example of a Production Data Mining Architecture

528

Architectural Overview

528

Customer Interaction Module

529

Analysis Module

530

Data Mining Software

532

Range of Techniques

532

Scalability 533

Support for Scoring

534

Multiple Levels of User Interfaces

535

Comprehensible Output

536

Ability to Handle Diverse Data Types

536

Documentation and Ease of Use

536

470643 ftoc.qxd 3/8/04 11:33 AM Page xvii

Contents xvii

Availability of Training for Both Novice and

Advanced Users, Consulting, and Support

537

Vendor Credibility

537

Lessons Learned

537

Chapter 17 Preparing Data for Mining

539

What Data Should Look Like

540

The Customer Signature

540

The Columns

542

Columns with One Value

544

Columns with Almost Only One Value

544

Columns with Unique Values

546

Columns Correlated with Target

547

Model Roles in Modeling

547

Variable Measures

549

Numbers 550

Dates and Times

552

Fixed-Length Character Strings

552

IDs and Keys

554