Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

This is a powerful story. Setting aside the analytics, what can a retailer do with this information? There are two competing views. One says to put the beer and diapers close together, so when one is purchased, customers remember to buy the other one. The other says to put them as far apart as possible, so the customer must walk by as many stocked shelves as possible, having the opportunity to buy yet more items. The store could also put higher-margin diapers a bit closer to the beer, although mixing baby products and alcohol would probably be unseemly.

The story is so powerful that the authors noticed at least four companies using the story—IBM, Tandem (now part of HP), Oracle, and NCR Teradata. The actual story was debunked on April 6, 1998 in an article in Forbes magazine called “Beer-Diaper Syndrome.”

The debunked story still has a lesson. Apparently, the sales of beer and diapers were known to be correlated (at least in some stores) based on inventory. While doing a demonstration project, a sales manager suggested that the demo show something interesting, like “beer and diapers” being sold together. With this small hint, analysts were able to find evidence in the data.

Actually, the moral of the story is not about the power of association rules. It is that hypothesis testing can be very persuasive and actionable.

470643 c09.qxd 3/8/04 11:15 AM Page 299

Market Basket Analysis and Association Rules 299

How Good Is an Association Rule?

Association rules start with transactions containing one or more products or service offerings and some rudimentary information about the transaction. For the purpose of analysis, the products and service offerings are called items. Table 9.1

illustrates five transactions in a grocery store that carries five products.

These transactions have been simplified to include only the items purchased. How to use information like the date and time and whether the customer paid with cash or a credit card is discussed later in this chapter.

Each of these transactions gives us information about which products are purchased with which other products. This is shown in a co-occurrence table that tells the number of times that any pair of products was purchased together (see Table 9.2). For instance, the box where the “Soda” row intersects the “OJ” column has a value of “2,” meaning that two transactions contain both soda and orange juice. This is easily verified against the original transaction data, where customers 1 and 4 purchased both these items. The values along the diagonal (for instance, the value in the “OJ” column and the “OJ”

row) represent the number of transactions containing that item.

Table 9.1 Grocery Point-of-Sale Transactions

CUSTOMER

ITEMS

1

Orange juice, soda

2

Milk, orange juice, window cleaner

3

Orange juice, detergent

4

Orange juice, detergent, soda

5

Window cleaner, soda

Table 9.2 Co-Occurrence of Products

WINDOW

OJ

CLEANER MILK

SODA

DETERGENT

OJ

4

1

1

1

2

Window Cleaner

1

2

1

1

0

Milk

1

1

1

0

0

Soda

2

1

0

3

3

Detergent

1

0

0

1

2

470643 c09.qxd 3/8/04 11:15 AM Page 300

300 Chapter 9

This simple co-occurrence table already highlights some simple patterns:

■■

Orange juice and soda are more likely to be purchased together than any other two items.

■■

Detergent is never purchased with window cleaner or milk.

■■

Milk is never purchased with soda or detergent.

These observations are examples of associations and may suggest a formal rule like: “If a customer purchases soda, then the customer also purchases orange juice. ” For now, let’s defer discussion of how to find the rule automatically, and instead ask another question. How good is this rule?

In the data, two of the five transactions include both soda and orange juice.

These two transactions support the rule. The support for the rule is two out of five or 40 percent. Since both the transactions that contain soda also contain orange juice, there is a high degree of confidence in the rule as well. In fact, two of the three transactions that contains soda also contains orange juice, so the rule “if soda, then orange juice” has a confidence of 67 percent percent. The inverse rule, “if orange juice, then soda,” has a lower confidence. Of the four transactions with orange juice, only two also have soda. Its confidence, then, is just 50 percent. More formally, confidence is the ratio of the number of the transactions supporting the rule to the number of transactions where the conditional part of the rule holds. Another way of saying this is that confidence is the ratio of the number of transactions with all the items to the number of transactions with just the “if” items.

Another question is how much better than chance the rule is. One way to answer this is to calculate the lift (also called improvement), which tells us how much better a rule is at predicting the result than just assuming the result in the first place. Lift is the ratio of the density of the target after application of the left-hand side to the density of the target in the population. Another way of saying this is that lift is the ratio of the records that support the entire rule to the number that would be expected, assuming that there is no relationship between the products (the exact formula is given later in the chapter). A similar measure, the excess, is the difference between the number of records supported by the entire rule minus the expected value. Because the excess is measured in the same units as the original sales, it is sometimes easier to work with.

Figure 9.7 provides an example of lift, confidence, and support as provided by Blue Martini, a company that specializes in tools for retailers. Their software system includes a suite of analysis tools that includes association rules.

470643 c09.qxd 3/8/04 11:15 AM Page 301

Market Basket Analysis and Association Rules 301

This particular example shows that a particular jacket is much more likely to be purchased with a gift certificate, information that can be used for improving messaging for selling both gift certificates and jackets.

The ideas behind the co-occurrence table extend to combinations with any number of items, not just pairs of items. For combinations of three items, imagine a cube with each side split into five different parts, as shown in Figure 9.8.

Even with just five items in the data, there are already 125 different subcubes to fill in. By playing with symmetries in the cube, this can be reduced a bit (by a factor of six), but the number of subcubes for groups of three items is proportional to the third power of the number of different items. In general, the number of combinations with n items is proportional to the number of items raised to the n th power—a number that gets very large, very fast.

And generating the co-occurrence table requires doing work for each of these combinations.

Figure 9.7 Blue Martini provides an interface that shows the support, confidence, and lift of an association rule.

470643 c09.qxd 3/8/04 11:15 AM Page 302

302

Chapter 9

Detergent

1

0

0

1

1

Soda

2

0

0

2

1

Milk

1

1

1

0

0

Cleaner

1

1

1

0

0

Detergent

Soda

Milk

OJ

4

1

1

2

1

Cleaner

OJ

OJ

Cleaner

Milk

Soda Detergent

TEAMFLY

Orange juice, milk, and

window cleaner appear

together in exactly one

transaction.

Figure 9.8 A co-occurrence table in three dimensions can be visualized as a cube.

Building Association Rules

This basic process for finding association rules is illustrated in Figure 9.9.

There are three important concerns in creating association rules:

■■

Choosing the right set of items.

■■

Generating rules by deciphering the counts in the co-occurrence matrix.

■■

Overcoming the practical limits imposed by thousands or tens of thousands of items.

The next three sections delve into these concerns in more detail.

Team-Fly®

470643 c09.qxd 3/8/04 11:15 AM Page 303

Market Basket Analysis and Association Rules 303

1

First determine the right set

of items and the right level.

For instance, is pizza an item

or are the toppings items?

2

Topping

Probability

Next, calculate the probabilities and

joint probabilities of items and

combinations of interest, perhaps

limiting the search using threshholds

on support or value.

3

Finally, analyze the probabilities to

If mushroom then pepperoni.

determine the right rules.

Figure 9.9 Finding association rules has these basic steps.

Choosing the Right Set of Items

The data used for finding association rules is typically the detailed transaction data captured at the point of sale. Gathering and using this data is a critical part of applying market basket analysis, depending crucially on the items chosen for analysis. What constitutes a particular item depends on the business need. Within a grocery store where there are tens of thousands of products on the shelves, a frozen pizza might be considered an item for analysis purposes—regardless of its toppings (extra cheese, pepperoni, or mushrooms), its crust (extra thick, whole wheat, or white), or its size. So, the purchase of a large whole wheat vegetarian pizza contains the same “frozen pizza” item as the purchase of a single-serving, pepperoni with extra cheese. A sample of such transactions at this summarized level might look like Table 9.3.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *