Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

MBR is a k- nearest neighbors approach. Determining which neighbors are near requires a distance function. There are many approaches to measuring the distance between two records. The careful choice of an appropriate distance function is a critical step in using MBR. The chapter introduced an approach to creating an overall distance function by building a distance function for each field and normalizing it. The normalized field distances can then be combined in a Euclidean fashion or summed to produce a Manhattan distance.

When the Euclidean method is used, a large difference in any one field is enough to cause two records to be considered far apart. The Manhattan method is more forgiving—a large difference on one field can more easily be offset by close values on other fields. A validation set can be used to pick the best distance function for a given model set by applying all candidates to see which

470643 c08.qxd 3/8/04 11:14 AM Page 286

286 Chapter 8

produces better results. Sometimes, the right choice of neighbors depends on modifying the distance function to favor some fields over others. This is easily accomplished by incorporating weights into the distance function.

The next question is the number of neighbors to choose. Once again, investigating different numbers of neighbors using the validation set can help determine the optimal number. There is no right number of neighbors. The number depends on the distribution of the data and is highly dependent on the problem being solved.

The basic combination function, weighted voting, does a good job for categorical data, using weights inversely proportional to distance. The analogous operation for estimating numeric values is a weighted average.

One good application for memory based reasoning is making recommendations. Collaborative filtering is an approach to making recommendations that works by grouping people with similar tastes together using a distance function that can compare two lists user-supplied ratings. Recommendations for a new person are calculated using a weighted average of the ratings of his or her nearest neighbors.

470643 c09.qxd 3/8/04 11:15 AM Page 287

C H A P T E R

9

Market Basket Analysis

and Association Rules

To convey the fundamental ideas of market basket analysis, start with the image of the shopping cart in Figure 9.1 filled with various products purchased by someone on a quick trip to the supermarket. This basket contains an assortment of products—orange juice, bananas, soft drink, window cleaner, and detergent. One basket tells us about what one customer purchased at one time. A complete list of purchases made by all customers provides much more information; it describes the most important part of a retailing business—what merchandise customers are buying and when.

Each customer purchases a different set of products, in different quantities, at different times. Market basket analysis uses the information about what customers purchase to provide insight into who they are and why they make certain purchases. Market basket analysis provides insight into the merchandise by telling us which products tend to be purchased together and which are most amenable to promotion. This information is actionable: it can suggest new store layouts; it can determine which products to put on special; it can indicate when to issue coupons, and so on. When this data can be tied to individual customers through a loyalty card or Web site registration, it becomes even more valuable.

The data mining technique most closely allied with market basket analysis is the automatic generation of association rules. Association rules represent patterns in the data without a specified target. As such, they are an example of undirected data mining. Whether the patterns make sense is left to human interpretation.

287

470643 c09.qxd 3/8/04 11:15 AM Page 288

288 Chapter 9

In this shopping basket, the shopper purchased

a quart of orange juice, some bananas, dish

detergent, some window cleaner, and a six

pack of soda.

How do the

Is soda typically purchased with

demographics of the

bananas? Does the brand of soda

neighborhood affect

make a difference?

what customers buy?

What should be in the

basket but is not?

Are window cleaning products

purchased when detergent and orange

juice are bought together?

Figure 9.1 Market basket analysis helps you understand customers as well as items that are purchased together.

Association rules were originally derived from point-of-sale data that describes what products are purchased together. Although its roots are in analyzing point-of-sale transactions, association rules can be applied outside the retail industry to find relationships among other types of “baskets.” Some examples of potential applications are:

■■

Items purchased on a credit card, such as rental cars and hotel rooms, provide insight into the next product that customers are likely to purchase.

■■

Optional services purchased by telecommunications customers (call waiting, call forwarding, DSL, speed call, and so on) help determine how to bundle these services together to maximize revenue.

■■

Banking services used by retail customers (money market accounts, CDs, investment services, car loans, and so on) identify customers likely to want other services.

■■

Unusual combinations of insurance claims can be a sign of fraud and can spark further investigation.

■■

Medical patient histories can give indications of likely complications based on certain combinations of treatments.

Association rules often fail to live up to expectations. In our experience, for instance, they are not a good choice for building cross-selling models in

470643 c09.qxd 3/8/04 11:15 AM Page 289

Market Basket Analysis and Association Rules 289

industries such as retail banking, because the rules end up describing previous marketing promotions. Also, in retail banking, customers typically start with a checking account and then a savings account. Differentiation among products does not appear until customers have more products. This chapter covers the pitfalls as well as the uses of association rules.

The chapter starts with an overview of market basket analysis, including more basic analyses of market basket data that do not require association rules.

It then dives into association rules, explaining how they are derived. The chapter then continues with ways to extend association rules to include other facets of the market basket analysis.

Defining Market Basket Analysis

Market basket analysis does not refer to a single technique; it refers to a set of business problems related to understanding point-of-sale transaction data.

The most common technique is association rules, and much of this chapter delves into that subject. Before talking about association rules, this section talks about market basket data.

Three Levels of Market Basket Data

Market basket data is transaction data that describes three fundamentally different entities:

■■

Customers

■■

Orders (also called purchases or baskets or, in academic papers, item sets)

■■

Items

In a relational database, the data structure for market basket data often looks similar to Figure 9.2. This data structure includes four important entities.

LINE ITEM

ORDER

PRODUCT

CUSTOMER

LINE ITEM ID

ORDER ID

ORDER ID

PRODUCT ID

CUSTOMER ID

PRODUCT ID

CUSTOMER ID

CATEGORY

NAME

QUANTITY

ORDER DATE

SUBCATEGORY

ADDRESS

UNIT PRICE

PAYMENT TYPE

DESCRIPTION

etc.

UNIT COST

TOTAL VALUE

etc.

GIFT WRAP FLAG

SHIP DATE

SHIPPING COST

TAXABLE FLAG

etc.

etc.

Figure 9.2 A data model for transaction-level market basket data typically has three tables, one for the customer, one for the order, and one for the order line.

470643 c09.qxd 3/8/04 11:15 AM Page 290

290 Chapter 9

The order is the fundamental data structure for market basket data. An order represents a single purchase event by a customer. This might correspond to a customer ordering several products on a Web site or to a customer purchasing a basket of groceries or to a customer buying a several items from a catalog. This includes the total amount of the purchase, the total amount, additional shipping charges, payment type, and whatever other data is relevant about the transaction. Sometimes the transaction is given a unique identifier.

Sometimes the unique identifier needs to be cobbled together from other data.

In one example, we needed to combine four fields to get an identifier for purchases in a store—the timestamp when the customer paid, chain ID, store ID, and lane ID.

Individual items in the order are represented separately as line items. This data includes the price paid for the item, the number of items, whether tax should be charged, and perhaps the cost (which can be used for calculating margin). The item table also typically has a link to a product reference table, which provides more descriptive information about each product. This descriptive information should include the product hierarchy and other information that might prove valuable for analysis.

The customer table is an optional table and should be available when a customer can be identified, for example, on a Web site that requires registration or when the customer uses an affinity card during the transaction. Although the customer table may have interesting fields, the most powerful element is the ID itself, because this can tie transactions together over time.

Tracking customers over time makes it possible to determine, for instance, which grocery shoppers “bake from scratch”—something of keen interest to the makers of flour as well as prepackaged cake mixes. Such customers might be identified from the frequency of their purchases of flour, baking powder, and similar ingredients, the proportion of such purchases to the customer’s total spending, and the lack of interest in prepackaged mixes and ready-to-eat desserts. Of course, such ingredients may be purchased at different times and in different quantities, making it necessary to tie together multiple transactions over time.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *