Berry M.J.A. – Data Mining Techniques For Marketing, Sales & Customer Relationship Management – Page 106 – Library. Read online. Free books read online. Read books without registering

He started with a population of 100 randomly generated assignments, none of

470643 c13.qxd 3/8/04 11:18 AM Page 435

Genetic Algorithms 435

which were very good. After 80 generations, the package lowered the score to 21—considerably better than he was able to do by hand.

This example gives a good feeling for optimization problems where genetic algorithms are applicable. They differ from most data mining problems because they are more rule-oriented than data-oriented. The key to solving these problems is to incorporate the constraints into a single fitness function to be optimized (either by finding a maximum or a minimum). The resulting fitness function might be highly nonlinear, making it difficult to optimize using other techniques. As we will see, the same techniques adapt to situations featuring larger amounts of data.

T I P Genetic algorithms are a good tool when there are more rules than data in the problem (although they are useful in other areas as well). These types of scheduling problems often involve competition for limited resources subject to complex relationships that describe resources and their users.

Schemata: Why Genetic Algorithms Work

At first sight, there is nothing sacrosanct in the selection, crossover, and mutation operators introduced earlier in this chapter. Why, for instance, does crossover choose only one intermediate point instead of two or more? Why do low mutation rates produce better results? The fact that nature behaves this way is not sufficient justification if multiple crossover points would produce better results more quickly or if a high mutation rate would work better.

For solving problems that yield actionable results, the fact that genetic algorithms have worked well in practice may be sufficient justification for continuing to use them as they are. However, it is still comforting to know that the technique has a theoretical foundation. Prof. Holland developed his theory of schemata processing in the early 1970s to explain why selection, crossover, and mutation work so well in practice. Readers interested in using genetic algorithms for some of their problems are particularly well advised to understand schemata, even if the genetic algorithms are buried inside tools they are using, since this understanding explains both the power and the limits of the technique.

A schema, which comes from the Greek word meaning “form” or “figure,” is simply a representation of the patterns present in a genome. Schemata (the plural is formed from the Greek root) are represented as sequences of symbols.

The 1 s and 0 s (called the fixed positions) of genomes are augmented by an asterisk, *, that matches either a 0 or a 1. The relationship between a schema and a genome is simple. A genome matches a schema when the fixed positions in the

470643 c13.qxd 3/8/04 11:18 AM Page 436

436 Chapter 13

schema match the corresponding positions in the genome. An example should make this quite clear; the following schema:

1 0 * *

matches all of the following four genomes because they all have four symbols, beginning with a 1 followed by a 0:

1 0 0 0

1 0 0 1

1 0 1 1

1 0 1 0

The order of a schema is the number of fixed positions that it contains. For instance, the order of 1*10111 is 6, of *** 1010**1 is 5, and of 0************** is 1.

The defining length of a schema is the distance between the outermost fixed positions. So, the defining length of 1*10111 is 6 (counting from the left, 7 – 1), of *** 1010**1 is 6 (10 – 4) and of 0************** is 0 (1 – 1).

Now, let us look at fitness functions in terms of schemata. If the genome 000

survives from one generation to the next, then the schema 0** has also survived, as have * 0* , ** 0, * 00, 0*0, 00* , and ***. The fitness of a particular schema, then, is the average fitness of all the genomes that match the schema in a given population. For instance, the fitness of the schema 0** is the average fitness of the genomes 000, 001, 010, and 011 since the schema survives when these genomes survive, at least considering only the selection operator. Consider two schemata from the previous example using the fitness function 31 p – p , 2

10*** and 00***. One genome in the initial population matches 10***, so its fitness is 176. The two genomes matching 00*** have fitness values of 87 and 58.

The first schema is fitter than the second. And, in fact, in the next generation there is only one genome matching 00*** and there are two matching 10***. The fitter schema has survived and proliferated; the less fit is disappearing.

A geometric view of schemata is sometimes helpful for understanding them. Consider the eight possible genomes of length three: 000, 001, 010, 011, 100, 101, 110, and 111. These lie at the corners of a unit cube, as shown in Figure 13.3. Schemata then correspond to the edges and faces of the cube. The edges are the schemata of order 2 and the faces of order 1. As genetic algorithms are processing different genomes; they are also processing schemata, visualized by these features on a cube. The population covers pieces of the cube trying to find the corners with the best fitness, and the schemata provide information about large regions of the possible solutions. This geometric perspective generalizes to higher dimensions, where the selection, crossover, and mutation operators correspond to cuts through hypercubes in some higher-dimension space that is a bit harder to visualize

470643 c13.qxd 3/8/04 11:18 AM Page 437

Genetic Algorithms 437

011

111

*11

01*

11*

010

*10

110

Schema for

this face is

1*1

**0

Schema for

0*0

1*0

this face is

1**

101

10*

*00

000

100

Schema for

this edge

Figure 13.3 A cube is a useful representation of schemata on 3 bits. The corners represent the genomes, the edges represent the schemata of order 2, the faces, the schemata of order 1, and the entire cube, the schema of order 0.

Consider the schema, 1***1. This is also quite fit in the original population, with a fitness of 150. There is one genome that matches it in the original population and the same one in the next generation. This schema has survived only because the genome containing it did not cross over with another genome. A crossover would likely have destroyed it. Compare this to 10*** that survived a crossover. The shorter the defining length of a schema, the more likely it will be to survive from one generation to another. So, even longer schemata that are very fit are likely to be replaced by shorter, but fit, cousins. Using more complicated crossover techniques, such as making two cuts, changes the behavior entirely. With more complicated techniques, the defining length is no longer useful, and Holland’s results on schemata do not necessarily hold.

Holland rigorously proved these two observations and summed them up in the Schema Theorem (also called the Fundamental Theorem of Genetic Algorithms): short, low-order schemata with above-average fitness increase in population from one generation to the next. In other words, short, low-order schemata are the building blocks that genetic algorithms are working on. From one generation to the next, the fittest building blocks survive and mix with each other to produce fitter and fitter genomes.

470643 c13.qxd 3/8/04 11:18 AM Page 438

438 Chapter 13

The Schema Theorem explains that genetic algorithms are really searching through the possible schemata to find fit building blocks that survive from one generation to the next. A natural question is how many building blocks are typically being processed? We will spare the reader the details, but Holland showed that the number of schemata being processed by a population of n genomes is proportional to n 3. This means that each generation is really evaluating n 3 different schemata, even though it is only doing work on n different genomes.

Holland calls this property implicit parallelism. The computational effort for a genetic algorithm is proportional to the size of the population, and in this effort, the algorithm is usefully processing a number of schemata proportional to n 3. The property of implicit parallelism should not be confused with explicit parallelism that is available when running genetic algorithms on a distributed network of workstations or on a computer with multiple processors.

The Schema Theorem gives us insight into why genomes work better when there are only two symbols ( 0 s and 1 s) in the representation. Finding the best building blocks requires processing as many schemata as possible from one generation to the next. For two symbols, the number of different genomes of a given length is 2length and the number of different schemata is 3length. Roughly, the number of unique schemata being processed by a single genome is about 1.5length. Now, what happens if there are more symbols in the alphabet, say by adding 2 and 3? Now the number of genomes of a given length is 4length, and the number of different schemata is 5length (because the asterisk adds one more symbol). Although there are more schemata, the number of schemata corresponding to a given genome is only 1.25length. As the number of symbols increases, the relative number of schemata decreases. Another way of looking at this is to consider the schema * 00. If there are only two letters in the alphabet, then only two genomes process this schema, 000 and 100. If there are four letters, then there are four genomes: 000, 100, 200, and 300. Since genetic algorithms are trying to find the best schemata using a given population size, the additional genomes do not help the search.