Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 65 – Library. Read online. Free books read online. Read books without registering

When using a feed-forward, back propagation network, sensitivity analysis can take advantage of the error measures calculated during the learning phase instead of having to test each feature independently. The validation set is fed into the network to produce the output and the output is compared to the predicted output to calculate the error. The network then propagates the error back through the units, not to adjust any weights but to keep track of the sensitivity of each input. The error is a proxy for the sensitivity, determining how much each input affects the output in the network. Accumulating these sensitivities over the entire test set determines which inputs have the larger effect on the output. In our experience, though, the values produced in this fashion are not particularly useful for understanding the network.

T I P Neural networks do not produce easily understood rules that explain how they arrive at a given result. Even so, it is possible to understand the relative importance of inputs into the network by using sensitivity analysis. Sensitivity can be a manual process where each feature is tested one at a time relative to the other features. It can also be more automated by using the sensitivity information generated by back propagation. In many situations, understanding the relative importance of inputs is almost as good as having explicit rules.

470643 c07.qxd 3/8/04 11:37 AM Page 249

Artificial Neural Networks 249

Self-Organizing Maps

Self-organizing maps (SOMs) are a variant of neural networks used for undirected data mining tasks such as cluster detection. The Finnish researcher Dr. Tuevo Kohonen invented self-organizing maps, which are also called Kohonen Networks. Although used originally for images and sounds, these networks can also recognize clusters in data. They are based on the same underlying units as feed-forward, back propagation networks, but SOMs are quite different in two respects.

They have a different topology and the back propagation method of learning is no longer applicable. They have an entirely different method for training.

What Is a Self-Organizing Map?

The self-organizing map (SOM), an example of which is shown in Figure 7.13, is a neural network that can recognize unknown patterns in the data. Like the networks we’ve already looked at, the basic SOM has an input layer and an output layer. Each unit in the input layer is connected to one source, just as in the networks for predictive modeling. Also, like those networks, each unit in the SOM has an independent weight associated with each incoming connection (this is actually a property of all neural networks). However, the similarity between SOMs and feed-forward, back propagation networks ends here.

The output layer consists of many units instead of just a handful. Each of the units in the output layer is connected to all of the units in the input layer. The output layer is arranged in a grid, as if the units were in the squares on a checkerboard. Even though the units are not connected to each other in this layer, the grid-like structure plays an important role in the training of the SOM, as we will see shortly.

How does an SOM recognize patterns? Imagine one of the booths at a carnival where you throw balls at a wall filled with holes. If the ball lands in one of the holes, then you have your choice of prizes. Training an SOM is like being at the booth blindfolded and initially the wall has no holes, very similar to the situation when you start looking for patterns in large amounts of data and don’t know where to start. Each time you throw the ball, it dents the wall a little bit. Eventually, when enough balls land in the same vicinity, the indentation breaks through the wall, forming a hole. Now, when another ball lands at that location, it goes through the hole. You get a prize—at the carnival, this is a cheap stuffed animal, with an SOM, it is an identifiable cluster.

Figure 7.14 shows how this works for a simple SOM. When a member of the training set is presented to the network, the values flow forward through the network to the units in the output layer. The units in the output layer compete with each other, and the one with the highest value “wins.” The reward is to adjust the weights leading up to the winning unit to strengthen in the response to the input pattern. This is like making a little dent in the network.

470643 c07.qxd 3/8/04 11:37 AM Page 250

250 Chapter 7

The output units compete with

each other for the output of the

network.

The output layer is laid out like a

grid. Each unit is connected to

all the input units, but not to each

other.

The input layer is connected to

the inputs.

Figure 7.13 The self-organizing map is a special kind of neural network that can be used to detect clusters.

There is one more aspect to the training of the network. Not only are the weights for the winning unit adjusted, but the weights for units in its immediate neighborhood are also adjusted to strengthen their response to the inputs.

This adjustment is controlled by a neighborliness parameter that controls the size of the neighborhood and the amount of adjustment. Initially, the neighborhood is rather large, and the adjustments are large. As the training continues, the neighborhoods and adjustments decrease in size. Neighborliness actually has several practical effects. One is that the output layer behaves more like a connected fabric, even though the units are not directly connected to each other. Clusters similar to each other should be closer together than more dissimilar clusters. More importantly, though, neighborliness allows for a group of units to represent a single cluster. Without this neighborliness, the network would tend to find as many clusters in the data as there are units in the output layer—introducing bias into the cluster detection.

470643 c07.qxd 3/8/04 11:37 AM Page 251

Artificial Neural Networks 251

The winning output

unit and its path

0.1

0.2

0.1

0.2

0.6

0.9

0.1

0.7

0.6

0.4

0.8

Figure 7.14 An SOM finds the output unit that does the best job of recognizing a particular input.

Typically, a SOM identifies fewer clusters than it has output units. This is inefficient when using the network to assign new records to the clusters, since the new inputs are fed through the network to unused units in the output layer. To determine which units are actually used, we apply the SOM to the validation set. The members of the validation set are fed through the network, keeping track of the winning unit in each case. Units with no hits or with very few hits are discarded. Eliminating these units increases the run-time performance of the network by reducing the number of calculations needed for new instances.

Once the final network is in place—with the output layer restricted only to the units that identify specific clusters—it can be applied to new instances. An

470643 c07.qxd 3/8/04 11:37 AM Page 252

252 Chapter 7

unknown instance is fed into the network and is assigned to the cluster at the output unit with the largest weight. The network has identified clusters, but we do not know anything about them. We will return to the problem of identifying clusters a bit later.

The original SOMs used two-dimensional grids for the output layer. This was an artifact of earlier research into recognizing features in images composed of a two-dimensional array of pixel values. The output layer can really have any structure—with neighborhoods defined in three dimensions, as a network of hexagons, or laid out in some other fashion.

Example: Finding Clusters

A large bank is interested in increasing the number of home equity loans that it sells, which provides an illustration of the practical use of clustering. The bank decides that it needs to understand customers that currently have home equity loans to determine the best strategy for increasing its market share. To start this process, demographics are gathered on 5,000 customers who have home equity loans and 5,000 customers who do not have them. Even though the proportion of customers with home equity loans is less than 50 percent, it is a good idea to have equal weights in the training set.

The data that is gathered has fields like the following:

TEAMFLY

■■

Appraised value of house

■■

Amount of credit available

■■

Amount of credit granted

■■

Age

■■

Marital status

■■

Number of children

■■

Household income

This data forms a good training set for clustering. The input values are mapped so they all lie between –1 and +1; these are used to train an SOM. The network identifies five clusters in the data, but it does not give any information about the clusters. What do these clusters mean?

A common technique to compare different clusters that works particularly well with neural network techniques is the average member technique. Find the most average member of each of the clusters—the center of the cluster. This is similar to the approach used for sensitivity analysis. To do this, find the average value for each feature in each cluster. Since all the features are numbers, this is not a problem for neural networks.