Table 8.15 applies weighted voting to the previous example. The “yes, customer will become inactive” vote is the first; the “no, this is a good customer”
vote is second.
Weighted voting has introduced enough variation to prevent ties. The confidence level can now be calculated as the ratio of winning votes to total votes (Table 8.16).
Table 8.15 Attrition Prediction with Weighted Voting
K = 1
K = 2
K = 3
K = 4
K = 5
dsum
0.749 to 0
1.441 to 0
1.441
2.085 to
2.085 to
to 0.647
0.647
1.290
dEuclid
0.669 to 0
0.669 to 0.669
to 1.157 to
1.601 to
0.562
1.062
1.062
1.062
470643 c08.qxd 3/8/04 11:14 AM Page 282
282 Chapter 8
Table 8.16 Confidence with Weighted Voting
1
2
3
4
5
dsum
yes, 100%
yes, 100%
yes, 69%
yes, 76%
yes, 62%
dEuclid
yes, 100%
yes, 54%
no, 61%
yes, 52%
yes, 60%
In this case, weighting the votes has only a small effect on the results and the confidence. The effect of weighting is largest when some neighbors are considerably further away than others.
Weighting can also be applied to estimation by replacing the simple average of neighboring values with an average weighted by distance. This approach is used in collaborative filtering systems, as described in the following section.
Collaborative Filtering: A Nearest Neighbor
Approach to Making Recommendations
Neither of the authors considers himself a country music fan, but one of them is the proud owner of an autographed copy of an early Dixie Chicks CD. The Chicks, who did not yet have a major record label, were performing in a local bar one day and some friends who knew them from Texas made a very enthu
TEAMFLY
siastic recommendation. The performance was truly memorable, featuring Martie Erwin’s impeccable Bluegrass fiddle, her sister Emily on a bewildering variety of other instruments (most, but not all, with strings), and the seductive vocals of Laura Lynch (who also played a stand-up electric bass). At the break, the band sold and autographed a self-produced CD that we still like better than the one that later won them a Grammy. What does this have to do with nearest neighbor techniques? Well, it is a human example of collaborative filtering. A recommendation from trusted friends will cause one to try something one otherwise might not try.
Collaborative filtering is a variant of memory-based reasoning particularly well suited to the application of providing personalized recommendations. A collaborative filtering system starts with a history of people’s preferences. The distance function determines similarity based on overlap of preferences—
people who like the same thing are close. In addition, votes are weighted by distances, so the votes of closer neighbors count more for the recommendation. In other words, it is a technique for finding music, books, wine, or anything else that fits into the existing preferences of a particular person by using the judgments of a peer group selected for their similar tastes. This approach is also called social information filtering.
Team-Fly®
470643 c08.qxd 3/8/04 11:14 AM Page 283
Memory-Based Reasoning and Collaborative Filtering 283
Collaborative filtering automates the process of using word-of-mouth to decide whether they would like something. Knowing that lots of people liked something is not enough. Who liked it is also important. Everyone values some recommendations more highly than others. The recommendation of a close friend whose past recommendations have been right on target may be enough to get you to go see a new movie even if it is in a genre you generally dislike.
On the other hand, an enthusiastic recommendation from a friend who thinks Ace Ventura: Pet Detective is the funniest movie ever made might serve to warn you off one you might otherwise have gone to see.
Preparing recommendations for a new customer using an automated collaborative filtering system has three steps: 1. Building a customer profile by getting the new customer to rate a selection of items such as movies, songs, or restaurants.
2. Comparing the new customer’s profile with the profiles of other customers using some measure of similarity.
3. Using some combination of the ratings of customers with similar profiles to predict the rating that the new customer would give to items he or she has not yet rated.
The following sections examine each of these steps in a bit more detail.
Building Profiles
One challenge with collaborative filtering is that there are often far more items to be rated than any one person is likely to have experienced or be willing to rate. That is, profiles are usually sparse, meaning that there is little overlap among the users’ preferences for making recommendations. Think of a user profile as a vector with one element per item in the universe of items to be rated. Each element of the vector represents the profile owner’s rating for the corresponding item on a scale of –5 to 5 with 0 indicating neutrality and null values for no opinion.
If there are thousands or tens of thousands of elements in the vector and each customer decides which ones to rate, any two customers’ profiles are likely to end up with few overlaps. On the other hand, forcing customers to rate a particular subset may miss interesting information because ratings of more obscure items may say more about the customer than ratings of common ones. A fondness for the Beatles is less revealing than a fondness for Mose Allison.
A reasonable approach is to have new customers rate a list of the twenty or so most frequently rated items (a list that might change over time) and then free them to rate as many additional items as they please.
470643 c08.qxd 3/8/04 11:14 AM Page 284
284 Chapter 8
Comparing Profiles
Once a customer profile has been built, the next step is to measure its distance from other profiles. The most obvious approach would be to treat the profile vectors as geometric points and calculate the Euclidean distance between them, but many other distance measures have been tried. Some give higher weight to agreement when users give a positive rating especially when most users give negative ratings to most items. Still others apply statistical correlation tests to the ratings vectors.
Making Predictions
The final step is to use some combination of nearby profiles in order to come up with estimated ratings for the items that the customer has not rated. One approach is to take a weighted average where the weight is inversely proportional to the distance. The example shown in Figure 8.7 illustrates estimating the rating that Nathaniel would give to Planet of the Apes based on the opinions of his neighbors, Simon and Amelia.
Michael
P eter
Stephanie
Crouching Tiger
Apocalypse Now
Vertical Ray of Sun
Planet Of The Apes –1
Crouching Tiger
Osmosis Jones
Simon
Apocalypse Now
American Pie 2
Vertical Ray of Sun
Plan 9 From Outer Space
Planet Of The Apes –4
.
Amelia
Osmosis Jones
.
American Pie 2
.
Nathaniel
Plan 9 From Outer Space
.
.
.
Jenn y
Alan
Figure 8.7 The predicted rating for Planet of the Apes is –2.66.
470643 c08.qxd 3/8/04 11:14 AM Page 285
Memory-Based Reasoning and Collaborative Filtering 285
Simon, who is distance 2 away, gave that movie a rating of –1. Amelia, who is distance 4 away, gave that movie a rating of –4. No one else’s profile is close enough to Nathaniel’s to be included in the vote. Because Amelia is twice as far away as Simon, her vote counts only half as much as his. The estimate for Nathaniel’s rating is weighted by the distance:
(1⁄2 (–1) + 1⁄4 (–4)) / (1⁄2 +1⁄4)= –1.5/0.75= –2.
A good collaborative filtering system gives its users a chance to comment on the predictions and adjust the profile accordingly. In this example, if Nathaniel rents the video of Planet of the Apes despite the prediction that he will not like it, he can then enter an actual rating of his own. If it turns out that he really likes the movie and gives it a rating of 4, his new profile will be in a slightly different neighborhood and Simon’s and Amelia’s opinions will count less for Nathaniel’s next recommendation.
Lessons Learned
Memory based reasoning is a powerful data mining technique that can be used to solve a wide variety of data mining problems involving classification or estimation. Unlike other data mining techniques that use a training set of preclassified data to create a model and then discard the training set, for MBR, the training set essentially is the model.
Choosing the right training set is perhaps the most important step in MBR.
The training set needs to include sufficient numbers of examples all possible classifications. This may mean enriching it by including a disproportionate number of instances for rare classifications in order to create a balanced training set with roughly the same number of instances for all categories. A training set that includes only instances of bad customers will predict that all customers are bad. In general, the size of the training set should have at least thousands, if not hundreds of thousands or millions, of examples.