What Is a Neural Net?
Neural networks consist of basic units that mimic, in a simplified fashion, the behavior of biological neurons found in nature, whether comprising the brain of a human or of a frog. It has been claimed, for example, that there is a unit within the visual system of a frog that fires in response to fly-like movements, and that there is another unit that fires in response to things about the size of a fly. These two units are connected to a neuron that fires when the combined value of these two inputs is high. This neuron is an input into yet another which triggers tongue-flicking behavior.
The basic idea is that each neural unit, whether in a frog or a computer, has many inputs that the unit combines into a single output value. In brains, these units may be connected to specialized nerves. Computers, though, are a bit simpler; the units are simply connected together, as shown in Figure 7.3, so the outputs from some units are used as inputs into others. All the examples in Figure 7.3 are examples of feed-forward neural networks, meaning there is a one-way flow through the network from the inputs to the outputs and there are no cycles in the network.
470643 c07.qxd 3/8/04 11:36 AM Page 221
Artificial Neural Networks
221
input 1
This simple neural network
takes four inputs and
input 2
produces an output. This
output
result of training this network
input 3
is equivalent to the statistical
technique called logistic
regression.
input 4
input 1
This network has a middle layer
input 2
called the hidden layer, which
output
makes the network more
input 3
powerful by enabling it to
recognize more patterns.
input 4
input 1
Increasing the size of the hidden
input 2
layer makes the network more
powerful but introduces the risk
output
of overfitting. Usually, only one
input 3
hidden layer is needed.
input 4
input 1
output 1
input 2
A neural network can produce
output 2
multiple output values.
input 3
output 3
input 4
Figure 7.3 Feed-forward neural networks take inputs on one end and transform them into outputs.
470643 c07.qxd 3/8/04 11:36 AM Page 222
222 Chapter 7
Feed-forward networks are the simplest and most useful type of network for directed modeling. There are three basic questions to ask about them:
■■
What are units and how do they behave? That is, what is the activation function?
■■
How are the units connected together? That is, what is the topology of a network?
■■
How does the network learn to recognize patterns? That is, what is back propagation and more generally how is the network trained?
The answers to these questions provide the background for understanding basic neural networks, an understanding that provides guidance for getting the best results from this powerful data mining technique.
What Is the Unit of a Neural Network?
Figure 7.4 shows the important features of the artificial neuron. The unit combines its inputs into a single value, which it then transforms to produce the output; these together are called the activation function. The most common activation functions are based on the biological model where the output remains very low until the combined inputs reach a threshold value. When the combined inputs reach the threshold, the unit is activated and the output is high.
TEAMFLY
Like its biological counterpart, the unit in a neural network has the property that small changes in the inputs, when the combined values are within some middle range, can have relatively large effects on the output. Conversely, large changes in the inputs may have little effect on the output, when the combined inputs are far from the middle range. This property, where sometimes small changes matter and sometimes they do not, is an example of nonlinear behavior.
The power and complexity of neural networks arise from their nonlinear behavior, which in turn arises from the particular activation function used by the constituent neural units.
The activation function has two parts. The first part is the combination function that merges all the inputs into a single value. As shown in Figure 7.4, each input into the unit has its own weight. The most common combination function is the weighted sum, where each input is multiplied by its weight and these products are added together. Other combination functions are sometimes useful and include the maximum of the weighted inputs, the minimum, and the logical AND or OR of the values. Although there is a lot of flexibility in the choice of combination functions, the standard weighted sum works well in many situations. This element of choice is a common trait of neural networks. Their basic structure is quite flexible, but the defaults that correspond to the original biological models, such as the weighted sum for the combination function, work well in practice.
Team-Fly®
470643 c07.qxd 3/8/04 11:36 AM Page 223
Artificial Neural Networks 223
output
The result is one output value,
usually between -1 and 1.
1
The transfer function calculates the
output value from the result of the
The
function and
transfer function
the activation {
0
combination
combination function.
-1
together constitute
The combination function combines
all the inputs into a single value,
function.
usually as a weighted summation.
bias
Each input has its own weight,
w1
w3
plus there is an additional
w2
weight called the bias.
inputs
Figure 7.4 The unit of an artificial neural network is modeled on the biological neuron.
The output of the unit is a nonlinear combination of its inputs.
The second part of the activation function is the transfer function, which gets its name from the fact that it transfers the value of the combination function to the output of the unit. Figure 7.5 compares three typical transfer functions: the sigmoid (logistic), linear, and hyperbolic tangent functions. The specific values that the transfer function takes on are not as important as the general form of the function. From our perspective, the linear transfer function is the least interesting. A feed-forward neural network consisting only of units with linear transfer functions and a weighted sum combination function is really just doing a linear regression. Sigmoid functions are S-shaped functions, of which the two most common for neural networks are the logistic and the hyperbolic tangent.
The major difference between them is the range of their outputs, between 0 and 1 for the logistic and between –1 and 1 for the hyperbolic tangent.
The logistic and hyperbolic tangent transfer functions behave in a similar way. Even though they are not linear, their behavior is appealing to statisticians. When the weighted sum of all the inputs is near 0, then these functions are a close approximation of a linear function. Statisticians appreciate linear systems, and almost-linear systems are almost as well appreciated. As the
470643 c07.qxd 3/8/04 11:36 AM Page 224
224 Chapter 7
magnitude of the weighted sum gets larger, these transfer functions gradually saturate (to 0 and 1 in the case of the logistic; to –1 and 1 in the case of the hyperbolic tangent). This behavior corresponds to a gradual movement from a linear model of the input to a nonlinear model. In short, neural networks have the ability to do a good job of modeling on three types of problems: linear problems, near-linear problems, and nonlinear problems. There is also a relationship between the activation function and the range of input values, as discussed in the sidebar, “Sigmoid Functions and Ranges for Input Values.”
A network can contain units with different transfer functions, a subject we’ll return to later when discussing network topology. Sophisticated tools sometimes allow experimentation with other combination and transfer functions. Other functions have significantly different behavior from the standard units. It may be fun and even helpful to play with different types of activation functions. If you do not want to bother, though, you can have confidence in the standard functions that have proven successful for many neural network applications.
1.0
0.5
Sigmoid
(logistic)
0.0
-0.5
Linear
-1.0
Exponential (tanh)
0
Figure 7.5 Three common transfer functions are the sigmoid, linear, and hyperbolic tangent functions.
470643 c07.qxd 3/8/04 11:36 AM Page 225
Artificial Neural Networks 225
SIGMOID FUNCTIONS AND RANGES FOR INPUT VALUES
The sigmoid activation functions are S-shaped curves that fall within bounds.
For instance, the logistic function produces values between 0 and 1, and the hyperbolic tangent produces values between –1 and 1 for all possible outputs of the summation function. The formulas for these functions are: logistic( x) = 1/(1 + e – x) tanh( x) = ( ex – e – x)/( ex + e – x) When used in a neural network, the x is the result of the combination function, typically the weighted sum of the inputs into the unit.
Since these functions are defined for all values of x, why do we recommend that the inputs to a network be in a small range, typically from –1 to 1? The reason has to do with how these functions behave near 0. In this range, they behave in an almost linear way. T