Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 64 – Library. Read online. Free books read online. Read books without registering

Offering all three products to all the customers is expensive and, even worse, may confuse the recipients, reducing the response rate.

The carrier test markets the products to a small subset of customers who receive all three offers but are only allowed to respond to one of them. It intends to use this information to build a model for predicting customer affinity for each offer. The training set uses the data collected from the test marketing campaign, and codes the propensity as follows: no response → –1.00, international → –0.33, national → +0.33, and specific numbers → +1.00. After training a neural network with information about the customers, the carrier starts applying the model.

But, applying the model does not go as well as planned. Many customers cluster around the four values used for training the network. However, apart from the nonresponders (who are the majority), there are many instances when the network returns intermediate values like 0.0 and 0.5. What can be done?

First, the carrier should use a validation set to understand the output values.

By interpreting the results of the network based on what happens in the validation set, it can find the right ranges to use for transforming the results of the network back into marketing segments. This is the same process shown in Figure 7.11.

Another observation in this case is that the network is really being used to predict three different things, whether a recipient will respond to each of the campaigns. This strongly suggests that a better structure for the network is to have three outputs: a propensity to respond to the international plan, to the long-distance plan, and to the specific numbers plan. The test set would then be used to determine where the cutoff is for nonrespondents. Alternatively, each outcome could be modeled separately, and the model results combined to select the appropriate campaign.

470643 c07.qxd 3/8/04 11:37 AM Page 244

244 Chapter 7

1.0

0.0

-1.0

Figure 7.11 Running a neural network on 10 examples from the validation set can help determine how to interpret results.

Neural Networks for Time Series

In many business problems, the data naturally falls into a time series. Examples of such series are the closing price of IBM stock, the daily value of the Swiss franc to U.S. dollar exchange rate, or a forecast of the number of customers who will be active on any given date in the future. For financial time series, someone who is able to predict the next value, or even whether the series is heading up

470643 c07.qxd 3/8/04 11:37 AM Page 245

Artificial Neural Networks 245

or down, has a tremendous advantage over other investors. Although predominant in the financial industry, time series appear in other areas, such as forecasting and process control. Financial time series, though, are the most studied since a small advantage in predictive power translates into big profits.

Neural networks are easily adapted for time-series analysis, as shown in Figure 7.12. The network is trained on the time-series data, starting at the oldest point in the data. The training then moves to the second oldest point, and the oldest point goes to the next set of units in the input layer, and so on.

The network trains like a feed-forward, back propagation network trying to predict the next value in the series at each step.

Time lag

Historical units

value 1, time t

Hidden la er

value 1, time t-1

value 1, time t-2

output

value 2, time t

value 1, time t+1

value 2, time t-1

value 2, time t-2

Figure 7.12 A time-delay neural network remembers the previous few training examples and uses them as input into the network. The network then works like a feed-forward, back propagation network.

470643 c07.qxd 3/8/04 11:37 AM Page 246

246 Chapter 7

Notice that the time-series network is not limited to data from just a single time series. It can take multiple inputs. For instance, to predict the value of the Swiss franc to U.S. dollar exchange rate, other time-series information might be included, such as the volume of the previous day’s transactions, the U.S. dollar to Japanese yen exchange rate, the closing value of the stock exchange, and the day of the week. In addition, non-time-series data, such as the reported inflation rate in the countries over the period of time under investigation, might also be candidate features.

The number of historical units controls the length of the patterns that the network can recognize. For instance, keeping 10 historical units on a network predicting the closing price of a favorite stock will allow the network to recognize patterns that occur within 2-week time periods (since exchange rates are set only on weekdays). Relying on such a network to predict the value 3

months in the future may not be a good idea and is not recommended.

Actually, by modifying the input, a feed-forward network can be made to work like a time-delay neural network. Consider the time series with 10 days of history, shown in Table 7.5. The network will include two features: the day of the week and the closing price.

Create a time series with a time lag of three requires adding new features for the historical, lagged values. (Day-of-the-week does not need to be copied, since it does not really change.) The result is Table 7.6. This data can now be input into a feed-forward, back propagation network without any special support for time series.

Table 7.5 Time Series

DATA ELEMENT

DAY-OF-WEEK

CLOSING PRICE

$40.25

$41.00

$39.25

$39.75

$40.50

$40.75

$41.25

$42.00

$41.50

470643 c07.qxd 3/8/04 11:37 AM Page 247

Artificial Neural Networks 247

Table 7.6 Time Series with Time Lag

PREVIOUS-1

DATA

DAY-OF-

CLOSING

CLOSING CLOSING

ELEMENT

WEEK

PRICE

$40.25

$41.00

$40.25

$39.25

$41.00

$40.25

$39.75

$39.25

$41.00

$40.50

$39.75

$39.25

$40.50

$39.75

$40.75

$40.50

$41.25

$40.75

$40.50

$42.00

$41.25

$40.75

$41.50

$42.00

$41.25

How to Know What Is Going on

Inside a Neural Network

Neural networks are opaque. Even knowing all the weights on all the nodes throughout the network does not give much insight into why the network produces the results that it produces. This lack of understanding has some philosophical appeal—after all, we do not understand how human consciousness arises from the neurons in our brains. As a practical matter, though, opaqueness impairs our ability to understand the results produced by a network.

If only we could ask it to tell us how it is making its decision in the form of rules. Unfortunately, the same nonlinear characteristics of neural network nodes that make them so powerful also make them unable to produce simple rules. Eventually, research into rule extraction from networks may bring unequivocally good results. Until then, the trained network itself is the rule, and other methods are needed to peer inside to understand what is going on.

A technique called sensitivity analysis can be used to get an idea of how opaque models work. Sensitivity analysis does not provide explicit rules, but it does indicate the relative importance of the inputs to the result of the network. Sensitivity analysis uses the test set to determine how sensitive the output of the network is to each input. The following are the basic steps: 1. Find the average value for each input. We can think of this average value as the center of the test set.

470643 c07.qxd 3/8/04 11:37 AM Page 248

248 Chapter 7

2. Measure the output of the network when all inputs are at their average value.

3. Measure the output of the network when each input is modified, one at a time, to be at its minimum and maximum values (usually –1 and 1, respectively).

For some inputs, the output of the network changes very little for the three values (minimum, average, and maximum). The network is not sensitive to these inputs (at least when all other inputs are at their average value). Other inputs have a large effect on the output of the network. The network is sensitive to these inputs. The amount of change in the output measures the sensitivity of the network for each input. Using these measures for all the inputs creates a relative measure of the importance of each feature. Of course, this method is entirely empirical and is looking only at each variable independently. Neural networks are interesting precisely because they can take interactions between variables into account.

There are variations on this procedure. It is possible to modify the values of two or three features at the same time to see if combinations of features have a particular importance. Sometimes, it is useful to start from a location other than the center of the test set. For instance, the analysis might be repeated for the minimum and maximum values of the features to see how sensitive the network is at the extremes. If sensitivity analysis produces significantly different results for these three situations, then there are higher order effects in the network that are taking advantage of combinations of features.