Genetic Algorithms for Search

6/27/2001 Mark Rich, CS540
Reading, How to Solve It: Modern Heuristics Ch. 6-7

<- PREV | NEXT ->

Population of Solutions

Our approaches so far have involved some element of local search. We keep track of one current node, and search its neighborhood for a new, hopefully better, location in the space. But why are we limititing ourselves to just one node? To increase our chances of finding the optimal solution, we could run a population of algorithms in parallel, and reap the best result from these individual runs. However, if we have multiple soltions and runs in different locations of the space, then why not use them to inform each other of better locations? This is the approach of evolutionary algorithms. We continue our approach of modeling nature, first with a temperature as in Simulated Annealing, and then with a memory with Tabu Search, but now, we attempt to model the process of evolution and natural selection.

Basic Evolutionary Algorithm

In evolutionary algorithms, the individual is irrelevant. It is the system as a whole that we focus on. Adding evolutionary pressures of selection and reproduction will produce on average better and better solutions each generation. We will implement competition and interaction between our solutions, helping them learn from each other and gravitate towards the optimal solution.

procedure evolutionary algorithm
begin
  t <- 0
  initialize Population(t)
  evaluate Population(t)
  while (not termination-condition) do
  begin
    t <- t + 1
    select Population(t) from Population(t - 1)
    alter Population(t)
    evaluate Population(t)
  end
end

The important elements are the initialization of the population, evaluation of each individual in the population, selection of those individuals to pass on to the next generation, and alteration of the selected individuals to generate new, hopefully better, solutions. We will examine each of these elements in turn, however decisions made in one portion of the algorithm could affect choices available elsewhere.

Initialization

First, we must determine how large to make the population. Again, it is a matter of choice that should be tuned to the specific problem at hand. We wish to initialize the population with diverse individuals. Why? Because they will be learning from each other. One of the issues EA have to deal with is premature convergence to sub-optimal solutions due to a lack of diversity in the population. There are many ways to arrange this initial diversity..

Evaluation Functions

Selecting which individuals can reproduce is based on the evaluation and comparison of solutions. Therefore we must take care when designing an evaluation function, such that it discriminates between better and worse solutions.

For example, let's say our task is to find the string of text "HAPPY" and our possible solutions are any arrangement of 5 alpahbetic characters. The size of this search space is 526. How can we distinguish between good and bad strings? One possible, albeit very poor, function assigns the word "HAPPY" a score of 1, and all other strings a score of 0. You can see how hard it would be to search using this evaluation function.

To develop a good evaluation function, one thing we can do is look at our operators. Suppose we have an operator that lets us replace a particular character with another one, similar to 1-flip in SAT. A useful evaluation function would be based on how many letters of "HAPPY" are in the correct position. For instance, "HXOPN" receives score of 2, while "OOGIE" receives a score of 0.

But under this evaluation function, the string "APPYH" would receive a score of 0, even though all the letters of the goal string are present. With only our 1-flip operator, this is reasonable, but suppose we add another operator, that allows us to rotate the string either left or right. Now we would like "APPYH" to receive a much better score than 0. A more complex evaluation function we could derive would be 1 point for each letter of the string that is a part of the goal string, and an additional point for each letter in the correct possition. Then "HXOPN" would still receive a score of 2, while "APPYH" is now evaluated to be 6, and "HAPXY" would have a score of 8.

Selection

Selection of individuals for the next generation, either to reproduce or to live on, relies heavily on the evaluation function above. How heavily is dependent on which selection technique you use. We wish to apply some pressure so that good solutions survive, and weak solutions die; too much, and we converge to less than optimal solutions, too little and we never make progress towards the solution. Again, it is a balancing act to find the right selection technique for the problem at hand.

Deterministic

In deterministic selection, only the best survive. This leads to very fast vergence. Two deterministic selection techniques are common, one that includes parents in determining the best solutions, and one which replaces all parents with children.

We can represent the size of the population as "mu", and the number of children generated as "lambda". (mu + lambda) selection chooses the best "mu" to continue to the next generation, and the competition is between both parents and children. (mu, lambda) selection again chooses the best "mu", however it is only the children that factor into who's the best. Parents are thrown away to fight early convergence. Deterministic selection relies very heavily on the evaluation function, and converges the fastest of all methods we will discuss.

Proportional Fitness

Instead of taking the best "mu", each individual can be selected proportionally to their evaluation score. Suppose we have the following population:

IndividualScore
A4
B10
C14
D7
E9
F6

The sum of their scores is 50. This gives individual A 8% chance of being selected, individual B 20%, etc. We usually implement this with what is called "roulette wheel selection". Select a random number between 0 and 1. Then progressively add on the probabilities of each individual in order, until this sum is greater than the random number. For example, I randomly choose .77. This selects individual E, since 8 + 20 + 28 + 14 + 18 is 88%. With a random choice of .34, we select C, since 8 + 20 + 28 = 56%. To select the next generation, we would need to choose "mu" random numbers.

We see in proportional fitness, even the worst individual, A has a chance to reproduce, albeit only 8%. This will help prevent stagnation in the population.

Tournament Selection

In this scheme, two individuals are selected at random with replacement from the population, and the one with the best score gets selected to reproduce. Using the above example, one round of tournament selection could choose B and D for competition. B would then be selected since its score of 10 is larger than D's score of 7. Repeat this "mu" times to get the next population.

So how is this different from proportional fitness? Now, A has a 1/36 chance of reproducing (the chance of choosing A for both sides of the competition), about 2.8%, while C, the most fit individual, has a 11/36 chance, or 30.6%. Tournament selection does not care about the spread of the scores, only the ranking. The nth ranked invididual in a population of size mu will have a (2mu - 2n + 1) / mu2 chance of reproducing. This puts an upper and lower bound on the chances of any individual to reproduce for the next generation. Tournament selection can be generalized to include more than 2 individuals being chosen for competition, and selecting the best from this group.

Alteration

Mutation

The first most basic way to alter a solution for the next generation is to use mutation. We can use the operators from our local search techniques to slightly twiddle with the solution and introduce new, random information. The two-swap and two-interchance methods are good mutation methods for the TSP.

Crossover for vectors

But the interesting behavior arises from genetic algorithms because of the ability of solutions to learn from each other. Solutions can combine to form offspring for the next generation. Sometimes they will pass on their worst information, but if we do crossover in combination with a forceful selection technique, then we should see better solutions result. Since there are many details to crossover with permutations as in TSP, we will cover the basic crossover techniques, known as "cut and splice" techniques, for vectors today, such as SAT. This works great for vectors, but not with permutations. We will almost always create illegal solutions by using these crossover techniques with say a path representation for the tour. More thought and time need to be devoted to gain intuition with different representations.