Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

For example, say that half the members of a cluster are male and half are female, and that male maps to –1.0 and female to +1.0. The average member for this cluster would have a value of 0.0 for this feature. In another cluster, Team-Fly®

470643 c07.qxd 3/8/04 11:37 AM Page 253

Artificial Neural Networks

253

there may be nine females for every male. For this cluster, the average member would have a value of 0.8. This averaging works very well with neural networks since all inputs have to be mapped into a numeric range.

T I P Self-organizing maps, a type of neural network, can identify clusters but they do not identify what makes the members of a cluster similar to each other.

A powerful technique for comparing clusters is to determine the center or average member in each cluster. Using the test set, calculate the average value for each feature in the data. These average values can then be displayed in the same graph to determine the features that make a cluster unique.

These average values can then be plotted using parallel coordinates as in Figure 7.15, which shows the centers of the five clusters identified in the banking example. In this case, the bank noted that one of the clusters was particularly interesting, consisting of married customers in their forties with children.

A bit more investigation revealed that these customers also had children in their late teens. Members of this cluster had more home equity lines than members of other clusters.

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Available

Credit

Age

Marital

Num

Income

Credit

Balance

Status

Children

This cluster looks interesting. High-income customers

with children in the middle age group who are taking

out large loans.

Figure 7.15 The centers of five clusters are compared on the same graph. This simple visualization technique (called parallel coordinates) helps identify interesting clusters.

470643 c07.qxd 3/8/04 11:37 AM Page 254

254 Chapter 7

The story continues with the Marketing Department of the bank concluding that these people had taken out home equity loans to pay college tuition fees.

The department arranged a marketing program designed specifically for this market, selling home equity loans as a means to pay for college education. The results from this campaign were disappointing. The marketing program was not successful.

Since the marketing program failed, it may seem as though the clusters did not live up to their promise. In fact, the problem lay elsewhere. The bank had initially only used general customer information. It had not combined information from the many different systems servicing its customers. The bank returned to the problem of identifying customers, but this time it included more information—from the deposits system, the credit card system, and so on.

The basic methods remained the same, so we will not go into detail about the analysis. With the additional data, the bank discovered that the cluster of customers with college-age children did actually exist, but a fact had been overlooked. When the additional data was included, the bank learned that the customers in this cluster also tended to have business accounts as well as personal accounts. This led to a new line of thinking. When the children leave home to go to college, the parents now have the opportunity to start a new business by taking advantage of the equity in their home.

With this insight, the bank created a new marketing program targeted at the parents, about starting a new business in their empty nest. This program succeeded, and the bank saw improved performance from its home equity loans group. The lesson of this case study is that, although SOMs are powerful tools for finding clusters, neural networks really are only as good as the data that goes into them.

Lessons Learned

Neural networks are a versatile data mining tool. Across a large number of industries and a large number of applications, neural networks have proven themselves over and over again. These results come in complicated domains, such as analyzing time series and detecting fraud, that are not easily amenable to other techniques. The largest neural network developed for production is probably the system that AT&T developed for reading numbers on checks. This neural network has hundreds of thousands of units organized into seven layers.

Their foundation is based on biological models of how brains work.

Although predating digital computers, the basic ideas have proven useful. In biology, neurons fire after their inputs reach a certain threshold. This model

470643 c07.qxd 3/8/04 11:37 AM Page 255

Artificial Neural Networks 255

can be implemented on a computer as well. The field has really taken off since the 1980s, when statisticians started to use them and understand them better.

A neural network consists of artificial neurons connected together. Each neuron mimics its biological counterpart, taking various inputs, combining them, and producing an output. Since digital neurons process numbers, the activation function characterizes the neuron. In most cases, this function takes the weighted sum of its inputs and applies an S-shaped function to it. The result is a node that sometimes behaves in a linear fashion, and sometimes behaves in a nonlinear fashion—an improvement over standard statistical techniques.

The most common network is the feed-forward network for predictive modeling. Although originally a breakthrough, the back propagation training method has been replaced by other methods, notably conjugate gradient.

These networks can be used for both categorical and continuous inputs. However, neural networks learn best when input fields have been mapped to the range between –1 and +1. This is a guideline to help train the network. Neural networks still work when a small amount of data falls outside the range and for more limited ranges, such as 0 to 1.

Neural networks do have several drawbacks. First, they work best when there are only a few input variables, and the technique itself does not help choose which variables to use. Variable selection is an issue. Other techniques, such as decision trees can come to the rescue. Also, when training a network, there is no guarantee that the resulting set of weights is optimal. To increase confidence in the result, build several networks and take the best one.

Perhaps the biggest problem, though, is that a neural network cannot explain what it is doing. Decision trees are popular because they can provide a list of rules. There is no way to get an accurate set of rules from a neural network. A neural network is explained by its weights, and a very complicated mathematical formula. Unfortunately, making sense of this is beyond our human powers of comprehension.

Variations on neural networks, such as self-organizing maps, extend the technology to undirected clustering. Overall neural networks are very powerful and can produce good models; they just can’t tell us how they do it.

470643 c07.qxd 3/8/04 11:37 AM Page 256

470643 c08.qxd 3/8/04 11:14 AM Page 257

C H A P T E R

8

Nearest Neighbor Approaches:

Memory-Based Reasoning and

Collaborative Filtering

You hear someone speak and immediately guess that she is from Australia.

Why? Because her accent reminds you of other Australians you have met. Or you try a new restaurant expecting to like it because a friend with good taste recommended it. Both cases are examples of decisions based on experience.

When faced with new situations, human beings are guided by memories of similar situations they have experienced in the past. That is the basis for the data mining techniques introduced in this chapter.

Nearest neighbor techniques are based on the concept of similarity.

Memory-based reasoning (MBR) results are based on analogous situations in the past—much like deciding that a new friend is Australian based on past examples of Australian accents. Collaborative filtering adds more information, using not just the similarities among neighbors, but also their preferences. The restaurant recommendation is an example of collaborative filtering.

Central to all these techniques is the idea of similarity. What really makes situations in the past similar to a new situation? Along with finding the similar records from the past, there is the challenge of combining the information from the neighbors. These are the two key concepts for nearest neighbor approaches.

This chapter begins with an introduction to MBR and an explanation of how it works. Since measures of distance and similarity are important to nearest neighbor techniques, there is a section on distance metrics, including a discussion of the meaning of distance for data types, such as free text, that have no 257

470643 c08.qxd 3/8/04 11:14 AM Page 258

258 Chapter 8

obvious geometric interpretation. The ideas of MBR are illustrated through a case study showing how MBR has been used to attach keywords to news stories. The chapter then looks at collaborative filtering, a popular approach to making recommendations, especially on the Web. Collaborative filtering is also based on nearest neighbors, but with a slight twist—instead of grouping restaurants or movies into neighborhoods, it groups the people recommending them.

Memory Based Reasoning

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *