“U”) defined by 31 p – p 2 where p varies between 0 and 31 (see Figure 13.1). The parameter p is expressed as a string of 5 bits to represent the numbers from 0
to 31; this bit string is the genetic material, called a genome. The fitness function peaks at the values 15 and 16, represented as 01111 and 10000, respectively.
This example shows that genetic algorithms are applicable even when there are multiple, dissimilar peaks.
300
250
200
150
Fitness Value
100
50
0
0
5
10
15
20
25
30
Parameter p
Figure 13.1 Finding the maximum of this simple function helps illustrate genetic algorithms.
470643 c13.qxd 3/8/04 11:18 AM Page 425
Genetic Algorithms 425
GAs work by evolving successive generations of genomes that get progressively more and more fit; that is, they provide better solutions to the problem.
In nature, fitness is simply the ability of an organism to survive and reproduce.
On a computer, evolution is simulated with the following steps:
1. Identify the genome and fitness function.
2. Create an initial generation of genomes.
3. Modify the initial population by applying the operators of genetic algorithms.
4. Repeat Step 3 until the fitness of the population no longer improves.
The first step is setting up the problem. In this simple example, the genome consists of a single, 5-bit gene for the parameter p. The fitness function is the parabola. Over the course of generations, the fitness function is going to be maximized.
For this example, shown in Table 13.1, the initial generation consists of four genomes, randomly produced. A real problem would typically have a population of hundreds or thousands of genomes, but that is impractical for illustrative purposes. Notice that in this population, the average fitness is 122.5—pretty good, since the actual maximum is 240, but evolution can improve it.
The basic algorithm modifies the initial population using three operators—
selection, then crossover, then mutation—as illustrated in Figure 13.2. These operators are explained in the next three sections.
Table 13.1 Four Randomly Generated Genomes
GENOME
P
FITNESS
10110
22
198
00011
3
84
00010
2
58
11001
25
150
470643 c13.qxd 3/8/04 11:18 AM Page 426
426
Chapter 13
generation n
generation n + 1
this genome
this genome
survives
Selection keeps the size of the population constant but
dies off
this genome
increases the fitness of the next generation. Genomes
multiplies
with a higher fitness (darker shading) proliferate and
genomes with lighter shading die off.
crossover
position
Crossover is a way of
combining two genomes.
A crossover position
determines where the
genomes “break” and
are recombined.
Mutation makes an occasional
mutation
random change to a random position
in a genome. This allows features to
appear that may not have been in the
original population.
Figure 13.2 The basic operators in genetic algorithms are selection, crossover, and mutation.
470643 c13.qxd 3/8/04 11:18 AM Page 427
Genetic Algorithms 427
SIMPLE OVERVIEW OF GENETICS
Life depends on proteins, which consist of sequences of 20 basic units called amino acids. The chromosomes in the nucleus of a cell are strands of DNA (deoxyribonucleic acid) that carry the blueprints for the proteins needed by the cell. The 23 chromosomes in each human cell together are the genome for that individual. In general, the genomes of different individuals in a species are very similar to each other; however, there are some individual differences.
The DNA in the genome encodes these blueprints for the amino acids sequences using strands of nucleotides. These nucleotides constitute the four letters of the genetic alphabet:
◆ A, adenine
◆ C, cytosine
◆ G, guanine
◆ T, thymine
Triplets of nucleotides represent the 20 amino acids. For instance, the amino acid called methionine corresponds to the triplet ATG. Another amino acid, lysine, has two “spellings”: AAA and AAG. So, if a strand of DNA contains the following letters:
ATGAAGATGCGA
then it decodes into a protein containing four amino acids: methionine, ATG; lysine, AAG; methionine, ATG; followed by arginine, CGA (see figure). This description intentionally glosses over the details of the actual biochemical mechanism that turns the blueprints into proteins, but it provides a high-level outline of the mapping from genetic information in DNA to the building blocks of proteins.
A biological example of encoding is the mapping from nucleotides in DNA to amino acids in protein.
In this simplified model, the process of evolution works as follows. The proteins produced by the representations in the DNA express themselves as features of the living organism, such as blue eyes, five fingers, the structure of the brain, a long trunk, and so on. Genes can express themselves in damaging ways, causing the resulting organism to die. Healthy organisms survive to produce offspring and pass their DNA to the next generation. In higher-level animals, the DNA is actually combined with the DNA from another survivor during sexual replication, using a technique called crossover. Sometimes, mistakes are made in passing genes from one generation to the next—these are mutations. The combination of all these processes over the course of many generations results in organisms highly adapted to their environment: the process of evolution.
(continued)
470643 c13.qxd 3/8/04 11:18 AM Page 428
428 Chapter 13
SIMPLE OVERVIEW OF GENETICS (continued)
DNA with
Amino Acids
Nucleotides
for Proteins
Adenine
Methionine
Thymine
Guanine
Adenine
A
Lysine
denine
Guanine
Adenine
T
Methionine
hymine
Guanine
Cytosine
G
Arginine
uanine
Adenine
470643 c13.qxd 3/8/04 11:18 AM Page 429
Genetic Algorithms 429
Selection
The selection step is analogous to the process of natural selection where only the fittest individuals in the population survive to pass their genetic material on to the next generation. Unlike nature, though, the size of the population remains constant from one generation to the next, so there is no chance of the population becoming extinct (which would clearly not be the optimum solution!). The chance of a genome surviving to the next generation is proportional to its fitness value—the better the fitness value relative to other genomes, the more copies that survive to the next generation. Table 13.2 shows the ratio of the fitness of the four genomes to the population fitness. This ratio determines the number of copies of each genome expected in the next generation.
The expected number of copies is a fraction, but the number of genomes in the population is never fractional. Survival is based on choosing the genomes in a random way proportional to their fitness. A random number is generated between 0 and 1, and this random number is used to determine whether a copy of a genome survives or not. Using the example from Table 13.2, if the first random number is less than 0.404, then genome 10110 would be chosen; if it is between 0.404 and 0.576 (40.4% + 17.1%), the genome 00011 would be chosen, and so on. More random numbers are generated until the next generation has the right number of genomes. Using a random number generator converts the fractional probabilities to whole number approximations, and it also allows some genomes with low fitness to survive.
Applying selection to the original four genomes yields the survivors shown in Table 13.3. Notice that in general this procedure produces more copies of the fitter genomes and fewer of the less fit. One of the less fit, 00011, has not survived this round of selection, but there are two copies of 10110, the fittest. And, the average fitness of the population has increased from 122.5 to 151.0.
Table 13.2 Using Fitness for Selection
% OF TOTAL
POPULATION
EXPECTED
GENOME
FITNESS
FITNESS
COPIES
10110
198
40.4%
1.62
00011
84
17.1%
0.69
00010
58
11.8%
0.47
11001
150
30.6%
1.22
470643 c13.qxd 3/8/04 11:18 AM Page 430
430 Chapter 13
Table 13.3 The Population after Selection
GENOME
P
FITNESS
10110
22
198
11001
25
150
00010
2
58
10110
22
198
Crossover
The next operator applied to the surviving genomes is crossover. Crossover, which occurs in nature, creates two new genomes from two existing ones by gluing together pieces of each one. As shown in Figure 13.2, crossover starts with two genomes and a random position. The first part of one genome swaps places with the first part of the second. For instance, starting with the two genomes 10110 and 00010 and using a crossover position between the second and third position works as follows:
1 0 1 1 0
0 0 0 1 0
The result of crossover is (the genes from the second genome are underlined): 1 0 0 1 0
0 0 1 1 0
The resulting genomes, called children, each have a piece of their chromosomes inherited from each of their parents. Applying crossover to the population proceeds by selecting pairs of genomes and flipping a coin to determine whether they split and swap. This probability is the crossover probability, denoted by pc. If they do cross over, then a random position is chosen and the children of the original genomes replace them in the next generation. A value of 0.5 for the crossover probability (corresponding to a coin toss) generally produces good results. In the example, the two genomes 10110 and 00010 are chosen for crossover, and the position is between the second and third genes (Table 13.4). Notice that after selection and crossover, the average fitness of the population has gone from 122.5 to 183.0. This is a significant improvement after only one generation.