In Mastering Data Mining (Wiley, 1999), we discuss a case study using a suite of tools from Ab Initio, Inc., a company that specializes in parallel data transformation software. This case study illustrates the power of such software when working on very large volumes of data, something to consider in an environment where such software might be available.
Special-Purpose Code
Coding is the tried-and-true way of implementing data transformations. The choice of tool is really based on what the programmer is most familiar with and what tools are available. For the transformations needed for a customer signature, the main statistical tools all have sufficient functionality.
One downside of using special-purpose code is that it adds an extra layer to the data transformation process. Data must still be extracted from source systems (one possible source of error) and then passed through code (another source of error). It is a good idea to write code that is well documented and reusable.
Data Mining Tools
Increasingly, data mining tools have the ability to transform data within the tool. Most tools have the ability to extract features from fields and to combine multiple fields in a row, although the support for non-numeric data types
470643 c17.qxd 3/8/04 11:29 AM Page 596
596 Chapter 17
varies from tool to tool and release to release. Some tools also support summarizations within the customer signature, such as binning variables (where the binning breakpoints are determined first by looking at the entire set of data) and standardization.
However, data mining tools are generally weak on looking up values and doing aggregations. For this reason, the customer signature is almost always created elsewhere and then loaded into the tool. Tools from leading vendors allow the embedding of programming code inside the tool and access to databases using SQL. Using these features is a good idea because such features reduce the number of things to keep track of when transforming data.
Lessons Learned
Data is the gasoline that powers data mining. The goal of data preparation is to provide a clean fuel, so the analytic engines work as efficiently as possible. For most algorithms, the best input takes the form of customer signatures, a single row of data with fields describing various aspects of the customer. Many of these fields are input fields, a few are targets used for predictive modeling.
Unfortunately, customer signatures are not the way data is found in available systems—and for good reason, since the signatures change over time. In fact, they are constantly being built and rebuilt, with newer data and newer ideas on what constitutes useful information.
Source fields come in several different varieties, such as numbers, strings, and dates. However, the most useful values are usually those that are added in. Creating derived values may be as simple as taking the sum of two fields.
Or, they may require much more sophisticated calculations on very large amounts of data. This is particularly true when trying to capture customer behavior over time, because time series, whether regular or irregular, must be summarized for the signature.
Data also suffers (and causes us to suffer along with it) from problems—
missing values, incorrect values, and values from different sources that disagree. Once such problems are identified, it is possible to work around them.
The biggest problems are the unknown ones—data that looks correct but is wrong for some reason.
Many data mining efforts have to use data that is less than perfect. As with old cars that spew blue smoke but still manage to chug along the street, these efforts produce results that are good enough. Like the vagabonds in Samuel Beckett’s play Waiting for Godot, we can choose to wait until perfection arrives.
That is the path to doing nothing; the better choice is to plow ahead, to learn, and to make incremental progress.
470643 c18.qxd 3/8/04 11:31 AM Page 597
C H A P T E R
18
Putting Data Mining to Work
You’ve reached the last chapter of this book, and you are ready to start putting data mining to work for your company. You are convinced that when data mining has been woven into the fabric of your organization, the whole enterprise will benefit from an increased understanding of its customers and market, from better-focused marketing, from more-efficient utilization of sales resources, and from more-responsive customer support. You also know that there is a big difference between understanding something you have read in a book and actually putting it into practice. This chapter is about how to bridge that gap.
At Data Miners, Inc., the consulting company founded by the authors of this book, we have helped many companies through their first data mining projects. Although this chapter focuses on a company’s first foray into data mining, it is really about how to increase the probability of success for any data mining project, whether the first or the fiftieth. It brings together ideas from earlier chapters and applies them to the design of a data mining pilot project.
The chapter begins with general advice about integrating data mining into the enterprise. It then discusses how to select and implement a successful pilot project. The chapter concludes with the story of one company’s initial data mining effort and its success.
597
470643 c18.qxd 3/8/04 11:31 AM Page 598
598 Chapter 18
Getting Started
The full integration of data mining into a company’s customer relationship management strategy is a large and daunting project. It is best approached incrementally, with achievable goals and measurable results along the way. The final goal is to have data mining so well integrated into the decision-making process that business decisions use accurate and timely customer information as a matter of course. The first step toward achieving this goal is demonstrating the real business value of data mining by producing a measurable return on investment from a manageable pilot or proof-of-concept project. The pilot should be chosen to be valuable in itself and to provide a solid basis for the business case needed to justify further investment in analytical CRM.
In fact, a pilot project is not that different from any other data mining project. All four phases of the virtuous cycle of data mining are represented in a pilot project albeit with some changes in emphasis. The proof of concept is limited in budget and timeframe. Some problems with data and procedures that would ordinarily need to be fixed may only be documented in a pilot project.
T I P A pilot project is a good first step in the incremental effort to revolutionize a business using data mining.
Here are the topic sentences for a few of the data mining pilot projects that we have collaborated on with our clients:
■■
Find 10,000 high-end mobile telephone customers customers who are most likely to churn in October in time for us to start an outbound telemarketing campaign in September.
■■
Find differences in the shopping profiles of Hispanic and non-Hispanic shoppers in Texas with respect to ready-to-eat cereals, so we can better direct our Spanish-language advertising campaigns.
■■
Guide our expansion plans by discovering what our best customers have in common with one another and locate new markets where similar customers can be found.
■■
Build a model to identify market research segments among the customers in our corporate data warehouse, so we can target messages to the right customers
■■
Forecast the expected level of debt collection for the next several months, so we can manage to a plan.
These examples show the diversity of problems that data mining can address. In each case, the data mining challenge is to find and analyze the appropriate data to solve the business problem. However, this process starts by choosing the right demonstration project in the first place.
470643 c18.qxd 3/8/04 11:31 AM Page 599
Putting Data Mining to Work 599
What to Expect from a Proof-of-Concept Project
When the proof-of-concept project is complete, the following are available:
■■
A prototype model development system (which might be outsourced or might be the kernel of the production system)
■■
An evaluation of several data mining techniques and tools (unless the choice of tool was foreordained)
■■
A plan for modifying business processes and systems to incorporate data mining
■■
A description of the production data mining environment
■■
A business case for investing in data mining and customer analytics Even when the decision has already been made to invest in data mining, the proof-of-concept project is an important way to step through the virtuous cycle of data mining for the first time. You should expect challenges and hiccups along the way, because such a project is touching several different parts of the organization—both technical and operational—and needs them to work together in perhaps unfamiliar ways.
Identifying a Proof-of-Concept Project
The purpose of a proof-of-concept project is to validate the utility of data mining while managing risk. The project should be small enough to be practical and important enough to be interesting. A successful data mining proof-of-concept project is one that leads to actions with measurable results. To find candidates for a proof of concept, study the existing business processes to identify areas where data mining could provide tangible benefits with results that can be measured in dollars. That is, the proof of concept should create a solid business case for further integration of data mining into the company’s marketing, sales, and customer-support operations.