Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

Data Mining Techniques

For Marketing, Sales, and

Customer Relationship

Management

Michael J.A. Berry

Gordon S. Linoff

Acknowledgments

We are fortunate to be surrounded by some of the most talented data miners anywhere, so our first thanks go to our colleagues at Data Miners, Inc. from whom we have learned so much: Will Potts, Dorian Pyle, and Brij Masand.

There are also clients with whom we work so closely that we consider them our colleagues as well: Harrison Sohmer and Stuart E. Ward, III are in that category. Our Editor, Bob Elliott, Editorial Assistant, Erica Weinstein, and Development Editor, Emilie Herman, kept us (more or less) on schedule and helped us maintain a consistent style. Lauren McCann, a graduate student at M.I.T.

and intern at Data Miners, prepared the census data used in some examples and created some of the illustrations.

We would also like to acknowledge all of the people we have worked with in scores of data mining engagements over the years. We have learned something from every one of them. The many whose data mining projects have influenced the second edition of this book include:

Al Fan

Herb Edelstein

Nick Gagliardo

Alan Parker

Jill Holtz

Nick Radcliffe

Anne Milley

Joan Forrester

Patrick Surry

Brian Guscott

John Wallace

Ronny Kohavi

Bruce Rylander

Josh Goff

Sheridan Young

Corina Cortes

Karen Kennedy

Susan Hunt Stevens

Daryl Berry

Kurt Thearling

Ted Browne

Daryl Pregibon

Lynne Brennen

Terri Kowalchuk

Doug Newell

Mark Smith

Victor Lo

Ed Freeman

Mateus Kehder

Yasmin Namini

Erin McCarthy

Michael Patrick

Zai Ying Huang

xix

470643 flast.qxd 3/8/04 11:32 AM Page xx

xx

Acknowledgments

And, of course, all the people we thanked in the first edition are still deserving of acknowledgement: Bob Flynn

Jim Flynn

Paul Berry

Bryan McNeely

Kamran Parsaye

Rakesh Agrawal

Claire Budden

Karen Stewart

Ric Amari

David Isaac

Larry Bookman

Rich Cohen

David Waltz

Larry Scroggins

Robert Groth

Dena d’Ebin

Lars Rohrberg

Robert Utzschnieder

Diana Lin

Lounette Dyer

Roland Pesch

Don Peppers

Marc Goodman

Stephen Smith

Ed Horton

Marc Reifeis

Sue Osterfelt

Edward Ewen

Marge Sherold

Susan Buchanan

Fred Chapman

Mario Bourgoin

Syamala Srinivasan

Gary Drescher

Prof. Michael Jordan

Wei-Xing Ho

Gregory Lampshire

Patsy Campbell

William Petefish

Janet Smith

Paul Becker

Yvonne McCollin

Jerry Modes

470643 flast.qxd 3/8/04 11:32 AM Page xxi

About the Authors

Michael J. A. Berry and Gordon S. Linoff are well known in the data mining field. They have jointly authored three influential and widely read books on data mining that have been translated into many languages. They each have close to two decades of experience applying data mining techniques to business problems in marketing and customer relationship management.

Michael and Gordon first worked together during the 1980s at Thinking Machines Corporation, which was a pioneer in mining large databases. In 1996, they collaborated on a data mining seminar, which soon evolved into the first edition of this book. The success of that collaboration gave them the courage to start Data Miners, Inc., a respected data mining consultancy, in 1998. As data mining consultants, they have worked with a wide variety of major companies in North America, Europe, and Asia, turning customer databases, call detail records, Web log entries, point-of-sale records, and billing files into useful information that can be used to improve the customer experience. The authors’ years of hands-on data mining experience are reflected in every chapter of this extensively updated and revised edition of their first book, Data Mining Techniques.

When not mining data at some distant client site, Michael lives in Cambridge, Massachusetts, and Gordon lives in New York City.

xxi

470643 flast.qxd 3/8/04 11:32 AM Page xxii

TEAMFLY

Team-Fly®

470643 flast.qxd 3/8/04 11:32 AM Page xxiii

Introduction

The first edition of Data Mining Techniques for Marketing, Sales, and Customer Support appeared on book shelves in 1997. The book actually got its start in 1996 as Gordon and I were developing a 1-day data mining seminar for NationsBank (now Bank of America). Sue Osterfelt, a vice president at NationsBank and the author of a book on database applications with Bill Inmon, convinced us that our seminar material ought to be developed into a book. She introduced us to Bob Elliott, her editor at John Wiley & Sons, and before we had time to think better of it, we signed a contract.

Neither of us had written a book before, and drafts of early chapters clearly showed this. Thanks to Bob’s help, though, we made a lot of progress, and the final product was a book we are still proud of. It is no exaggeration to say that the experience changed our lives — first by taking over every waking hour and some when we should have been sleeping; then, more positively, by providing the basis for the consulting company we founded, Data Miners, Inc.

The first book, which has become a standard text in data mining, was followed by others, Mastering Data Mining and Mining the Web.

So, why a revised edition? The world of data mining has changed a lot since we starting writing in 1996. For instance, back then, Amazon.com was still new; U.S. mobile phone calls cost on average 56 cents per minute, and fewer than 25 percent of Americans even owned a mobile phone; and the KDD data mining conference was in its second year. Our understanding has changed even more. For the most part, the underlying algorithms remain the same, although the software in which the algorithms are imbedded, the data to which they are applied, and the business problems they are used to solve have all grown and evolved.

xxiii

470643 flast.qxd 3/8/04 11:32 AM Page xxiv

xxiv Introduction

Even if the technological and business worlds had stood still, we would have wanted to update Data Mining Techniques because we have learned so much in the intervening years. One of the joys of consulting is the constant exposure to new ideas, new problems, and new solutions. We may not be any smarter than when we wrote the first edition, but we do have more experience and that added experience has changed the way we approach the material. A glance at the Table of Contents may suggest that we have reduced the amount of business-related material and increased the amount of technical material.

Instead, we have folded some of the business material into the technical chapters so that the data mining techniques are introduced in their business context. We hope this makes it easier for readers to see how to apply the techniques to their own business problems.

It has also come to our attention that a number of business school courses have used this book as a text. Although we did not write the book as a text, in the second edition we have tried to facilitate its use as one by using more examples based on publicly available data, such as the U.S. census, and by making some recommended reading and suggested exercises available at the companion Web site, www.data-miners.com/companion.

The book is still divided into three parts. The first part talks about the business context of data mining, starting with a chapter that introduces data mining and explains what it is used for and why. The second chapter introduces the virtuous cycle of data mining — the ongoing process by which data mining is used to turn data into information that leads to actions, which in turn create more data and more opportunities for learning. Chapter 3 is a much-expanded discussion of data mining methodology and best practices. This chapter benefits more than any other from our experience since writing the first book. The methodology introduced here is designed to build on the successful engagements we have been involved in. Chapter 4, which has no counterpart in the first edition, is about applications of data mining in marketing and customer relationship management, the fields where most of our own work has been done.

The second part consists of the technical chapters about the data mining techniques themselves. All of the techniques described in the first edition are still here although they are presented in a different order. The descriptions have been rewritten to make them clearer and more accurate while still retaining nontechnical language wherever possible.

In addition to the seven techniques covered in the first edition — decision trees, neural networks, memory-based reasoning, association rules, cluster detection, link analysis, and genetic algorithms — there is now a chapter on data mining using basic statistical techniques and another new chapter on survival analysis. Survival analysis is a technique that has been adapted from the small samples and continuous time measurements of the medical world to the

470643 flast.qxd 3/8/04 11:32 AM Page xxv

Introduction xxv

large samples and discrete time measurements found in marketing data. The chapter on memory-based reasoning now also includes a discussion of collaborative filtering, another technique based on nearest neighbors that has become popular with Web retailers as a way of generating recommendations.

The third part of the book talks about applying the techniques in a business context, including a chapter on finding customers in data, one on the relationship of data mining and data warehousing, another on the data mining environment (both corporate and technical), and a final chapter on putting data mining to work in an organization. A new chapter in this part covers preparing data for data mining, an extremely important topic since most data miners report that transforming data takes up the majority of time in a typical data mining project.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *