Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

This data can be combined to get an overall fitness score for each tract. Note that everyone in the tract gets the same score. The score represents the proportion of the population in that tract that fits the profile.

470643 c04.qxd 3/8/04 11:10 AM Page 94

94

Chapter 4

DATA BY CENSUS TRACT

The U.S. government is constitutionally mandated to carry out an enumeration of the population every 10 years. The primary purpose of the census is to allocate seats in the House of Representatives to each state. In the process of satisfying this mandate, the census also provides a wealth of information about the American population.

The U.S. Census Bureau (www.census.gov) surveys the American population using two questionnaires, the short form and the long form (not counting special purposes questionnaires, such as the one for military personnel). Most people get the short form, which asks a few basic questions about gender, age, ethnicity, and household size. Approximately 2 percent of the population gets the long form, which asks much more detailed questions about income, occupation, commuting habits, spending patterns, and more. The responses to these questionnaires provide the basis for demographic profiles.

The Census Bureau strives to keep this information up to date between each decennial census. The Census Bureau does not release information about individuals. Instead, it aggregates the information by small geographic areas. The most commonly used is the census tract, consisting of about 4,000 individuals.

Although census tracts do vary in size, they are much more consistent in population than other geographic units, such as counties and postal codes.

The census does have smaller geographic units, blocks and block groups; however, in order to protect the privacy of residents, some data is not made available below the level of census tracts. From these units, it is possible to aggregate information by county, state, metropolitan statistical area (MSA), legislative districts, and so on. The following figure shows some census tracts in the center of Manhattan:

Census Tract 189

Edu College+

19.2%

Occ Prof+Exec

17.8%

HHI $75K+

5.0%

HHI $100K+

2.4%

Census Tract 122

Edu College+

66.7%

Occ Prof+Exec

45.0%

HHI $75K+

58.0%

HHI $100K+

50.2%

Census Tract 129

Edu College+

44.8%

Occ Prof+Exec

36.5%

HHI $75K+

14.8%

HHI $100K+

7.2%

470643 c04.qxd 3/8/04 11:10 AM Page 95

Data Mining Applications

95

DATA BY CENSUS TRACT (continued)

One philosophy of marketing is based on the old proverb “birds of a feather flock together.” That is, people with similar interests and tastes live in similar areas (whether voluntarily or because of historical patterns of discrimination).

According to this philosophy, it is a good idea to market to people where you already have customers and in similar areas. Census information can be valuable, both for understanding where concentrations of customers are located and for determining the profile of similar areas.

Tract 189

Goal Tract Fitness

Edu College+

19.2%

61.3%

0.31

Occ Prof+Exec 17.8%

45.5%

0.39

HHI $75K+

5.0%

22.6%

0.22

HHI $100K+

2.4%

7.4%

0.32

Overall Advertising Fitness

0.31

Tract 122

Goal Tract Fitness

Edu College+

66.7% 61.3%

1.00

Occ Prof+Exec

45.0% 45.5%

0.99

HHI $75K+

58.0% 22.6%

1.00

HHI $100K+

50.2%

7.4%

1.00

Overall Advertising Fitness

1.00

Tract 129

Goal Tract Fitness

Edu College+

44.8%

61.3%

0.73

Occ Prof+Exec

36.5%

45.5%

0.80

HHI $75K+

14.8%

22.6%

0.65

HHI $100K+

7.2%

7.4%

0.97

Overall Advertising Fitness

0.79

Figure 4.1 Example of calculating readership fitness for three census tracts in Manhattan.

Data Mining to Improve Direct

Marketing Campaigns

Advertising can be used to reach prospects about whom nothing is known as individuals. Direct marketing requires at least a tiny bit of additional information such as a name and address or a phone number or an email address.

Where there is more information, there are also more opportunities for data mining. At the most basic level, data mining can be used to improve targeting by selecting which people to contact.

470643 c04.qxd 3/8/04 11:10 AM Page 96

96

Chapter 4

Actually, the first level of targeting does not require data mining, only data.

In the United States, and to a lesser extent in many other countries, there is quite a bit of data available about a large proportion of the population. In many countries, there are companies that compile and sell household-level data on all sorts of things including income, number of children, education level, and even hobbies. Some of this data is collected from public records.

Home purchases, marriages, births, and deaths are matters of public record that can be gathered from county courthouses and registries of deeds. Other data is gathered from product registration forms. Some is imputed using models. The rules governing the use of this data for marketing purposes vary from country to country. In some, data can be sold by address, but not by name. In others data may be used only for certain approved purposes. In some countries, data may be used with few restrictions, but only a limited number of households are covered. In the United States, some data, such as medical records, is completely off limits. Some data, such as credit history, can only be used for certain approved purposes. Much of the rest is unrestricted.

WA R N I N G The United States is unusual in both the extent of commercially available household data and the relatively few restrictions on its use. Although household data is available in many countries, the rules governing its use differ.

There are especially strict rules governing transborder transfers of personal data. Before planning to use houshold data for marketing, look into its availability in your market and the legal restrictions on making use of it.

Household-level data can be used directly for a first rough cut at segmentation based on such things as income, car ownership, or presence of children.

The problem is that even after the obvious filters have been applied, the remaining pool can be very large relative to the number of prospects likely to respond.

Thus, a principal application of data mining to prospects is targeting—finding the prospects most likely to actually respond to an offer.

Response Modeling

Direct marketing campaigns typically have response rates measured in the single digits. Response models are used to improve response rates by identifying prospects who are more likely to respond to a direct solicitation. The most useful response models provide an actual estimate of the likelihood of response, but this is not a strict requirement. Any model that allows prospects to be ranked by likelihood of response is sufficient. Given a ranked list, direct marketers can increase the percentage of responders reached by campaigns by mailing or calling people near the top of the list.

The following sections describe several ways that model scores can be used to improve direct marketing. This discussion is independent of the data

470643 c04.qxd 3/8/04 11:10 AM Page 97

Data Mining Applications

97

mining techniques used to generate the scores. It is worth noting, however, that many of the data mining techniques in this book can and have been applied to response modeling.

According to the Direct Marketing Association, an industry group, a typical mailing of 100,000 pieces costs about $100,000 dollars, although the price can vary considerably depending on the complexity of the mailing. Of that, some of the costs, such as developing the creative content, preparing the artwork, and initial setup for printing, are independent of the size of the mailing. The rest of the cost varies directly with the number of pieces mailed. Mailing lists of known mail order responders or active magazine subscribers can be purchased on a price per thousand names basis. Mail shop production costs and postage are charged on a similar basis. The larger the mailing, the less important the fixed costs become. For ease of calculation, the examples in this book assume that it costs one dollar to reach one person with a direct mail campaign. This is not an unreasonable estimate, although simple mailings cost less and very fancy mailings cost more.

Optimizing Response for a Fixed Budget

The simplest way to make use of model scores is to use them to assign ranks.

Once prospects have been ranked by a propensity-to-respond score, the prospect list can be sorted so that those most likely to respond are at the top of the list and those least likely to respond are at the bottom. Many modeling techniques can be used to generate response scores including regression models, decision trees, and neural networks.

Sorting a list makes sense whenever there is neither time nor budget to reach all prospects. If some people must be left out, it makes sense to leave out the ones who are least likely to respond. Not all businesses feel the need to leave out prospects. A local cable company may consider every household in its town to be a prospect and it may have the capacity to write or call every one of those households several times a year. When the marketing plan calls for making identical offers to every prospect, there is not much need for response modeling! However, data mining may still be useful for selecting the proper messages and to predict how prospects are likely to behave as customers.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Leave a Reply 0

Your email address will not be published. Required fields are marked *