X

Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management

From the point of view of the user, this hybrid technique has more in common with neural-network variants than it does with decision-tree variants because, in common with other neural-network techniques, it is not capable of explaining its decisions. The tree still produces rules, but these are of the form F( w 1 x 1, w 2 x 2, w 3 x 3, . . .) ≤ N, where F is the combining function used by the neural network. Such rules make more sense to neural network software than to people.

Piecewise Regression Using Trees

Another example of combining trees with other modeling methods is a form of piecewise linear regression in which each split in a decision tree is chosen so as to minimize the error of a simple regression model on the data at that node.

The same method can be applied to logistic regression for categorical target variables.

Alternate Representations for Decision Trees

The traditional tree diagram is a very effective way of representing the actual structure of a decision tree. Other representations are sometimes more useful when the focus is more on the relative sizes and concentrations of the nodes.

Box Diagrams

While the tree diagram and Twenty Questions analogy are helpful in visualizing certain properties of decision-tree methods, in some cases, a box diagram is more revealing. Figure 6.13 shows the box diagram representation of a decision tree that tries to classify people as male or female based on their ages and the movies they have seen recently. The diagram may be viewed as a sort of nested collection of two-dimensional scatter plots.

At the root node of a decision tree, the first three-way split is based on which of three groups the survey respondent’s most recently seen movie falls. In the outermost box of the diagram, the horizontal axis represents that field. The outermost box is divided into sections, one for each node at the next level of the tree.

The size of each section is proportional to the number of records that fall into it.

Next, the vertical axis of each box is used to represent the field that is used as the next splitter for that node. In general, this will be a different field for each box.

470643 c06.qxd 3/8/04 11:12 AM Page 200

200 Chapter 6

Last Movie in Group

Last Movie in Group

Last Movie in Group

1

2

3

age > 27

age > 41

Last Movie Last Movie

in Group

in Group

3

3

age ≤ 41

age ≤ 41

age > 27

age ≤ 27

Last Movie

in Group

1

age

Page: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154

Categories: Economics, finance
Oleg: