Sometimes it is desirable to find larger clusters than those provided by association rules, which include just a handful of items in any rule. Standard clustering techniques, which are described in Chapter 11, can also be used on market basket data. In this case, the data needs to be pivoted, as shown in Figure 9.6, so that each row represents one order or customer, and there is a flag or a counter for each product purchased. Unfortunately, there are often thousands of different products. To reduce the number of columns, such a transformation can take place at the category level, rather than at the individual product level.
There is typically a lot of information available about products. In addition to the product hierarchy, such information includes the color of clothes, whether food is low calorie, whether a poster includes a frame, and so on.
Such descriptions provide a wealth of information, and can lead to useful ad hoc questions:
■■
Do diet products tend to sell together?
■■
Are customers purchasing similar colors of clothing at the same time?
■■
Do customers who purchase framed posters also buy other products?
Being able to answer such questions is often more useful than trying to cluster products, since such directed questions often lead directly to marketing actions.
LINE ITEM
TE
LINE I
M ID
LINE ITEM
ORDER ID
PR
TE
LINE I
ODUCT ID M ID
LINE ITEM
ORDER PIVOT
QU
ORDER ID
ANTITY
UNIT PRICE
PRODUCT ID
LINE ITEM ID
ORDER ID
UNIT COST
QU
ORDER ID
ANTITY
HAS PRODUCT A
GIFT WRAP FLAG
UNIT PRICE
PRODUCT ID
HAS PRODUCT B
T
G
UNIT COST
AXABLE FLA
QUANTITY
HAS PRODUCT C
etc. GIFT WRAP FLAG
UNIT PRICE
HAS PRODUCT D
T
G
UNIT COST
AXABLE FLA
etc.
etc. GIFT WRAP FLAG
TAXABLE FLAG
etc.
ODUCT A
ODUCT B
ODUCT C
ODUCT D
PR
PR
PR
PR
ORDER ID
LINE ITEM ID
B
ORDER ID
0
1
1
0
. .
ORDER ID
LINE ITEM ID
C
Figure 9.6 Pivoting market basket data makes it possible to run clustering algorithms to find interesting groups of products.
470643 c09.qxd 3/8/04 11:15 AM Page 296
296 Chapter 9
Association Rules
One appeal of association rules is the clarity and utility of the results, which are in the form of rules about groups of products. There is an intuitive appeal to an association rule because it expresses how tangible products and services group together. A rule like, “if a customer purchases three-way calling, then that customer will also purchase call waiting, ” is clear. Even better, it might suggest a specific course of action, such as bundling three-way calling with call waiting into a single service package.
While association rules are easy to understand, they are not always useful.
The following three rules are examples of real rules generated from real data:
■■
Wal-Mart customers who purchase Barbie dolls have a 60 percent likelihood of also purchasing one of three types of candy bars.
■■
Customers who purchase maintenance agreements are very likely to purchase large appliances.
■■
When a new hardware store opens, one of the most commonly sold
items is toilet bowl cleaners.
The last two examples are examples that we have actually seen in data. The first is an example quoted in Forbes on September 8, 1997. These three examples illustrate the three common types of rules produced by association rules: the actionable, the trivial, and the inexplicable. In addition to these types of rules, the sidebar “Famous Rules” talks about one other category.
Actionable Rules
The useful rule contains high-quality, actionable information. Once the pattern is found, it is often not hard to justify, and telling a story can lead to insights and action. Barbie dolls preferring chocolate bars to other forms of food is not a likely story. Instead, imagine a family going shopping. The purpose: finding a gift for little Susie’s friend Emily, since her birthday is coming up. A Barbie doll is the perfect gift. At checkout, little Jacob starts crying. He wants something too—a candy bar fits the bill. Or perhaps Emily has a brother; he can’t be left out of the gift-giving festivities. Maybe the candy bar is for Mom, since buying Barbie dolls is a tiring activity and Mom needs some energy. These scenarios all suggest that the candy bar is an impulse purchase added onto that of the Barbie doll.
Whether Wal-Mart can make use of this information is not clear. This rule might suggest more prominent product placement, such as ensuring that customers must walk through candy aisles on their way back from Barbie-land. It might suggest product tie-ins and promotions offering candy bars and dolls together. It might suggest particular ways to advertise the products. Because the rule is easily understood, it suggests plausible causes and possible interventions.
470643 c09.qxd 3/8/04 11:15 AM Page 297
Market Basket Analysis and Association Rules 297
Trivial Rules
Trivial results are already known by anyone at all familiar with the business. The second example (“Customers who purchase maintenance agreements are very likely to purchase large appliances”) is an example of a trivial rule. In fact, customers typically purchase maintenance agreements and large appliances at the same time. Why else would they purchase maintenance agreements? The two are advertised together, and rarely sold separately (although when sold separately, it is the large appliance that is sold without the agreement rather than the agreement sold without the appliance). This rule, though, was found after analyzing hundreds of thousands of point-of-sale transactions from Sears.
Although it is valid and well supported in the data, it is still useless. Similar results abound: People who buy 2-by-4s also purchase nails; customers who purchase paint buy paint brushes; oil and oil filters are purchased together, as are hamburgers and hamburger buns, and charcoal and lighter fluid.
A subtler problem falls into the same category. A seemingly interesting result—such as the fact that people who buy the three-way calling option on their local telephone service almost always buy call waiting—may be the result of past marketing programs and product bundles. In the case of telephone service options, three-way calling is typically bundled with call waiting, so it is difficult to order it separately. In this case, the analysis does not produce actionable results; it is producing already acted-upon results. Although it is a danger for any data mining technique, market basket analysis is particularly susceptible to reproducing the success of previous marketing campaigns because of its dependence on unsummarized point-of-sale data—exactly the same data that defines the success of the campaign. Results from market basket analysis may simply be measuring the success of previous marketing campaigns.
Trivial rules do have one use, although it is not directly a data mining use.
When a rule should appear 100 percent of the time, the few cases where it does not hold provide a lot of information about data quality. That is, the exceptions to trivial rules point to areas where business operations, data collection, and processing may need to be further refined.
Inexplicable Rules
Inexplicable results seem to have no explanation and do not suggest a course of action.
The third pattern (“When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaner”) is intriguing, tempting us with a new fact but providing information that does not give insight into consumer behavior or the merchandise or suggest further actions. In this case, a large hardware company discovered the pattern for new store openings, but could not figure out how to profit from it. Many items are on sale during the store openings, but the toilet bowl cleaners stood out. More investigation might give some
470643 c09.qxd 3/8/04 11:15 AM Page 298
298 Chapter 9
explanation: Is the discount on toilet bowl cleaners much larger than for other products? Are they consistently placed in a high-traffic area for store openings but hidden at other times? Is the result an anomaly from a handful of stores?
Are they difficult to find at other times? Whatever the cause, it is doubtful that further analysis of just the market basket data can give a credible explanation.
WA R N I N G When applying market basket analysis, many of the results are often either trivial or inexplicable. Trivial rules reproduce common knowledge about the business, wasting the effort used to apply sophisticated analysis techniques. Inexplicable rules are flukes in the data and are not actionable.
FAMOUS RULES: BEER AND DIAPERS
Perhaps the most talked about association rule ever “found” is the association between beer and diapers. This is a famous story from the late 1980s or early 1990s, when computers were just getting powerful enough to analyze large volumes of data. The setting is somewhere in the midwest, where a retailer is analyzing point of sale data to find interesting patterns.
Lo and behold, lurking in all the transaction data, is the fact that beer and diapers are selling together. This immediately sets marketing minds in motion to figure out what is happening. A flash of insight provides the explanation: beer drinkers do not want to interrupt their enjoyment of televised sports, so they buy diapers to reduce trips to the bathroom. No, that’s not it. The more likely story is that families with young children are preparing for the weekend, diapers for the kids and beer for Dad. Dad probably knows that after he has a couple of beers, Mom will change the diapers.