Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

95% CONF LOWER HIGH

WIDTH

5.0%

1,000

0.6892%

1.96

3.65%

6.35%

2.70%

5.0%

5,000

0.3082%

1.96

4.40%

5.60%

1.21%

5.0%

10,000

0.2179%

1.96

4.57%

5.43%

0.85%

5.0%

20,000

0.1541%

1.96

4.70%

5.30%

0.60%

5.0%

40,000

0.1090%

1.96

4.79%

5.21%

0.43%

5.0%

60,000

0.0890%

1.96

4.83%

5.17%

0.35%

5.0%

80,000

0.0771%

1.96

4.85%

5.15%

0.30%

5.0%

100,000

0.0689%

1.96

4.86%

5.14%

0.27%

5.0%

120,000

0.0629%

1.96

4.88%

5.12%

0.25%

5.0%

140,000

0.0582%

1.96

4.89%

5.11%

0.23%

5.0%

160,000

0.0545%

1.96

4.89%

5.11%

0.21%

5.0%

180,000

0.0514%

1.96

4.90%

5.10%

0.20%

5.0%

200,000

0.0487%

1.96

4.90%

5.10%

0.19%

5.0%

500,000

0.0308%

1.96

4.94%

5.06%

0.12%

5.0%

1,000,000 0.0218%

1.96

4.96%

5.04%

0.09%

470643 c05.qxd 3/8/04 11:11 AM Page 146

146 Chapter 5

What the Confidence Interval Really Means

The confidence interval is a measure of only one thing, the statistical dispersion of the result. Assuming that everything else remains the same, it measures the amount of inaccuracy introduced by the process of sampling. It also assumes that the sampling process itself is random—that is, that any of the one million customers could have been offered the challenger offer with an equal likelihood. Random means random. The following are examples of what not to do:

■■

Use customers in California for the challenger and everyone else for the champion.

■■

Use the 5 percent lowest and 5 percent highest value customers for the challenger, and everyone else for the champion.

■■

Use the 10 percent most recent customers for the challenger, and everyone else for the champion.

■■

Use the customers with telephone numbers for the telemarketing campaign; everyone else for the direct mail campaign.

All of these are biased ways of splitting the population into groups. The previous results all assume that there is no such systematic bias. When there is systematic bias, the formulas for the confidence intervals are not correct.

Using the formula for the confidence interval means that there is no systematic bias in deciding whether a particular customer receives the champion or the challenger message. For instance, perhaps there was a champion model that predicts the likelihood of customers responding to the champion offer. If this model were used, then the challenger sample would no longer be a random sample. It would consist of the leftover customers from the champion model. This introduces another form of bias.

Or, perhaps the challenger model is only available to customers in certain markets or with certain products. This introduces other forms of bias. In such a case, these customers should be compared to the set of customers receiving the champion offer with the same constraints.

Another form of bias might come from the method of response. The challenger may only accept responses via telephone, but the champion may accept them by telephone or on the Web. In such a case, the challenger response may be dampened because of the lack of a Web channel. Or, there might need to be special training for the inbound telephone service reps to handle the challenger offer. At certain times, this might mean that wait times are longer, another form of bias.

The confidence interval is simply a statement about statistics and dispersion. It does not address all the other forms of bias that might affect results, and these forms of bias are often more important to results than sample variation. The next section talks about setting up a test and control experiment in marketing, diving into these issues in more detail.

470643 c05.qxd 3/8/04 11:11 AM Page 147

The Lure of Statistics: Data Mining Using Familiar Tools 147

Size of Test and Control for an Experiment

The champion-challenger model is an example of a two-way test, where a new method (the challenger) is compared to business-as-usual activity (the champion). This section talks about ensuring that the test and control are large enough for the purposes at hand. The previous section talked about determining the confidence interval for the sample response rate. Here, we turn this logic inside out. Instead of starting with the size of the groups, let’s instead consider sizes from the perspective of test design. This requires several items of information:

■■ Estimated response rate for one of the groups, which we call p

■■ Difference in response rates that we want to consider significant (acuity of the test), which we call d

■■ Confidence interval (say 95 percent)

This provides enough information to determine the size of the samples needed for the test and control. For instance, suppose that the business as usual has a response rate of 5 percent and we want to measure with 95 percent confidence a difference of 0.2 percent. This means that if the response of the test group greater than 5.2 percent, then the experiment can detect the difference with a 95 percent confidence level.

For a problem of this type, the first step this is to determine the value of SEDP. That is, if we are willing to accept a difference of 0.2 percent with a confidence of 95 percent, then what is the corresponding standard error? A confidence of 95 percent means that we are 1.96 standard deviations from the mean, so the answer is to divide the difference by 1.96, which yields 0.102 percent.

More generally, the process is to convert the p-value (95 percent) to a z-value (which can be done using the Excel function NORMSINV) and then divide the desired confidence by this value.

The next step is to plug these values into the formula for SEDP. For this, let’s assume that the test and control are the same size:

.

0 %

2 p ) (1 – p) (1 – p – d)

)

.

1 96 N + ( p + d)

N

Plugging in the values just described ( p is 5% and d is 0.2%) results in:

.

.

.

0.102% = 5% ) 95%

5 2% ) 94 8%

0 0963

+

=

N

N

N

N = 0 0963

.

= 66 875

,

( .

0 00102)2

So, having equal-sized groups of of 92,561 makes it possible to measure a 0.2

percent difference in response rates with a 95 percent accuracy. Of course, this does not guarantee that the results will differ by at least 0.2 percent. It merely

470643 c05.qxd 3/8/04 11:11 AM Page 148

148 Chapter 5

says that with control and test groups of at least this size, a difference in response rates of 0.2 percent should be measurable and statistically significant.

The size of the test and control groups affects how the results can be interpreted. However, this effect can be determined in advance, before the test. It is worthwhile determining the acuity of the test and control groups before running the test, to be sure that the test can produce useful results.

T I P Before running a marketing test, determine the acuity of the test by calculating the difference in response rates that can be measured with a high confidence (such as 95 percent).

Multiple Comparisons

The discussion has so far used examples with only one comparison, such as the difference between two presidential candidates or between a test and control group. Often, we are running multiple tests at the same time. For instance, we might try out three different challenger messages to determine if one of these produces better results than the business-as-usual message. Because handling multiple tests does affect the underlying statistics, it is important to understand what happens.

The Confidence Level with Multiple Comparisons

Consider that there are two groups that have been tested, and you are told that difference between the responses in the two groups is 95 percent certain to be due to factors other than sampling variation. A reasonable conclusion is that there is a difference between the two groups. In a well-designed test, the most likely reason would the difference in message, offer, or treatment.

Occam’s Razor says that we should take the simplest explanation, and not add anything extra. The simplest hypothesis for the difference in response rates is that the difference is not significant, that the response rates are really approximations of the same number. If the difference is significant, then we need to search for the reason why.

Now consider the same situation, except that you are now told that there were actually 20 groups being tested, and you were shown only one pair. Now you might reach a very different conclusion. If 20 groups are being tested, then you should expect one of them to exceed the 95 percent confidence bound due only to chance, since 95 percent means 19 times out of 20. You can no longer conclude that the difference is due to the testing parameters. Instead, because it is likely that the difference is due to sampling variation, this is the simplest hypothesis.

470643 c05.qxd 3/8/04 11:11 AM Page 149

The Lure of Statistics: Data Mining Using Familiar Tools 149

The confidence level is based on only one comparison. When there are multiple comparisons, that condition is not true, so the confidence as calculated previously is not quite sufficient.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *