Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

The ability to score customers using already created models from the model library on demand.

■■

The ability to manage hundreds of model scores over time.

■■

The ability to manage scores or hundreds of models developed over time.

■■ The ability to reconstruct a customer signature for any point in a customer’s tenure, such as immediately before a purchase or other interesting event.

■■

The ability to track changes in model scores over time.

■■

The ability to publish scores, rules, and other data mining results back to the data warehouse and to other applications that need them.

The data mining infrastructure is logically (and often physically) split into two pieces supporting two quite different activities: mining and scoring. Each task presents a different set of requirements.

470643 c16.qxd 3/8/04 11:29 AM Page 527

Building the Data Mining Environment 527

The Mining Platform

The mining platform supports software for data manipulation along with data mining software embodying the data mining techniques described in this book, visualization and presentation software, and software to enable models to be published to the scoring environment.

Although we have already touched on a few integration issues, others to consider include:

■■

Where in the client/server hierarchy is the software to be installed?

■■

Will the data mining software require its own hardware platform? If so, will this introduce a new operating system into the mix?

■■

What software will have to be installed on users’ desktops in order to communicate with the package?

■■

What additional networking, SQL gateways, and middleware will be required?

■■

Does the data mining software provide good interfaces to reporting and graphics packages?

The purpose of the mining platform is to support exploration of the data, mining, and modeling. The system should be devised with these activities in mind, including the fact that such work requires much processing and computing power. The data mining software vendor should be able to provide specifications for a data mining platform adequate for the anticipated dataset sizes and expected usage patterns.

The Scoring Platform

The scoring platform is where models developed on the mining platform are applied to customer records to create scores used to determine future treatments. Often, the scoring platform is the customer database itself, which is likely to be a relational database running on a parallel hardware platform.

In order to score a record, the record must contain, or the scoring platform must be able to calculate, the same features that went into the model. These features used by the model are rarely in the raw form in which they occur in the data. Often, new features have been created by combining existing variables in various ways, such as taking the ratio of one to another and performing transformations such as binning, summing, and averaging. Whatever was done to calculate the features used when the model was created must now be done for every record to be scored. Since there may be hundreds of millions of transactional records, it matters how this is done. When the volume of data is large, so is the data processing challenge.

470643 c16.qxd 3/8/04 11:29 AM Page 528

528 Chapter 16

Scoring is not complete until the scores reside on a customer database somewhere accessible to the software that will be used to select customers for inclusion in marketing campaigns. If Web log or call detail or point-of-sale scanner data needed as a model input resides in flat files on one system, and the customer marketing database resides on another system but the two are accurate as of different dates,this too can be a data processing challenge.

One Example of a Production Data Mining Architecture

Web retailing is an industry that has gone farther than most in routinely incorporating data mining and scoring into the operational environment. Many Web retailers update a customer’s profile with every transaction and use model scores to determine what to display and what to recommend. The architecture described here is from Blue Martini, a company that supplies software for mining-ready retail Web sites. The example it provides of how data mining can be made an integral part of a company’s operations is not restricted to Web retailing. Many companies could benefit from a similar architecture.

Architectural Overview

The Blue Martini architecture is designed to support the differing needs of marketers, merchandisers, and, not least, data miners. As shown in Figure 16.2, it has three modules for three different types of users. For merchandisers, this architecture supports multiple product hierarchies and tools for controlling collections and promotions. For marketers there are tools for making controlled experiments to track the effectiveness of various messages and marketing rules. For data miners, there is integrated modeling software and relief from having to create customer signatures by hand from dozens of different Web server and application logs. The architecture is what Ralph Kimball and Richard Merz would call a data Webhouse, made up of several special-purpose data marts with different schemas, all using common field definitions and shared metadata.

Customers at a Web store interact with pages generated as needed from a database that includes product information and the page templates. The contents of the page are driven by rules. Some of these rules are business rules entered by managers. Others are generated automatically and then edited by professional merchandisers.

470643 c16.qxd 3/8/04 11:29 AM Page 529

Building the Data Mining Environment 529

Product Hierarchies

Web Server with logs

Model Scores

Business Data

Customer Interaction

Application Server

Definition Module

Module

with logs

Promotions,

Collections

OLTP Database for

Customer Interaction

Business Rules

Analysis Module

Customer

OLAP

Signatures for

Database for

Mining

Reporting

Figure 16.2 Blue Martini provides a good example of an IT architecture for data mining–driven Web retailing.

Generating pages from a database has many advantages. First it makes it possible to enforce a consistent look and feel across the Web site. Such standard interfaces help customers navigate through the site. Using a database also makes it possible to make global changes quickly, such as updating prices for a sale. Another feature is the ability to store templates in different languages and currencies, so the site can be customized for users in different counties. From the data mining perspective, a major advantage is that all customer interactions are logged in the database.

User interactions are managed through a collection of data marts. Reporting and mining are centered on a customer behavior data mart that includes information derived from the user interaction, product, and business-rule data marts. The complicated extract and transformation logic required to create customer signatures from transaction data is part of the system—a great simplification for anyone who has ever tried massaging Web logs to get information about customers.

Customer Interaction Module

This architecture includes the databases and software needed to support merchandising, customer interaction, reporting, and mining as well as customer-centric marketing in the form of personalization. The Blue Martini system has

470643 c16.qxd 3/8/04 11:29 AM Page 530

530 Chapter 16

three major modules, each with its own data mart. These repositories keep track of the following:

■■

Business rules

■■

Customer and visitor transactions

■■

Customer behavior

The customer behavior data mart, shown in Figure 16.2 as part of the analysis module, is fed by data from the customer interaction module, and it, in turn, supplies rules to both the business data definition module and the customer interaction module.

Merchandising information such as product hierarchies, assortments (families of products that are grouped together for merchandising purposes), and price lists are maintained in the business rules data mart, as is content information such as Web page templates, images, sounds, and video clips. Business rules include personalization rules for greeting named customers, promotion rules, cross-sell rules, and so on. Much of the data mining effort for a retail site goes into generating these rules.

The customer interaction module is the part of the system that touches customers directly by processing all the customer transactions. The customer interaction module is responsible for maintaining users’ sessions and context.

This module implements the actual Web store and collects any data that may be wanted for later analysis. The customer transaction data mart logs business events such as the following:

■■

Customer adds an item to the basket.

■■

Customer initiates check-out process.

■■

Customer completes check-out process.

■■

Cross-sell rule is triggered, and recommendation is made.

■■

Recommended link is followed.

The customer interaction module supports marketing experiments by implementing control groups and keeping track of multiple rules. It has detailed knowledge of the content it serves and can track many things that are not tracked in the Web server logs. The customer interaction module collects data that allows both products and customers to be tracked over time.

Analysis Module

The database that supports the customer interaction module, like most online transaction processing systems, is a relational database designed to support quick transaction processing. Data destined for the analytic module must be extracted and transformed to support the structures suitable for mining and reporting. Data mining requires flat signature tables with one row per customer

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *