procedure evolutionary algorithm
begin
t <- 0
initialize Population(t)
evaluate Population(t)
while (not termination-condition) do
begin
t <- t + 1
select Population(t) from Population(t - 1)
alter Population(t)
evaluate Population(t)
end
end
The important elements are the initialization of the population, evaluation of each individual in the population, selection of those individuals to pass on to the next generation, and alteration of the selected individuals to generate new, hopefully better, solutions. We will examine each of these elements in turn, however decisions made in one portion of the algorithm could affect choices available elsewhere.
This is the initialization technique we have been using all along with SA and Tabu. We create tours randomly from the search space with a uniform distribution. But this is not the only way.
Here, we seed the population with selection from regular intervals in the search space. The size of the intervals, and exactly what makes an interval, is generally problem dependent.
Another restriction we can place on our population to ensure diversity is a non-clustering rule. Each newly generated individual must be a predefined distance away from all previously added individuals. For TSP, we could say that all new individuals must be at least 2 applications of 2-swap away from each other.
A final method for initialization is to use the soltions found by other search techniques, such as hill-climbing or SA. While this does not encourage diversity, we can guarantee that our genetic algorithm will do at least as well as the initial seed algorithm, and this can help reassure some skeptics.
For example, let's say our task is to find the string of text "HAPPY" and our possible solutions are any arrangement of 5 alpahbetic characters. The size of this search space is 526. How can we distinguish between good and bad strings? One possible, albeit very poor, function assigns the word "HAPPY" a score of 1, and all other strings a score of 0. You can see how hard it would be to search using this evaluation function.
To develop a good evaluation function, one thing we can do is look at our operators. Suppose we have an operator that lets us replace a particular character with another one, similar to 1-flip in SAT. A useful evaluation function would be based on how many letters of "HAPPY" are in the correct position. For instance, "HXOPN" receives score of 2, while "OOGIE" receives a score of 0.
But under this evaluation function, the string "APPYH" would receive a score of 0, even though all the letters of the goal string are present. With only our 1-flip operator, this is reasonable, but suppose we add another operator, that allows us to rotate the string either left or right. Now we would like "APPYH" to receive a much better score than 0. A more complex evaluation function we could derive would be 1 point for each letter of the string that is a part of the goal string, and an additional point for each letter in the correct possition. Then "HXOPN" would still receive a score of 2, while "APPYH" is now evaluated to be 6, and "HAPXY" would have a score of 8.
We can represent the size of the population as "mu", and the number of children generated as "lambda". (mu + lambda) selection chooses the best "mu" to continue to the next generation, and the competition is between both parents and children. (mu, lambda) selection again chooses the best "mu", however it is only the children that factor into who's the best. Parents are thrown away to fight early convergence. Deterministic selection relies very heavily on the evaluation function, and converges the fastest of all methods we will discuss.
| Individual | Score |
| A | 4 |
| B | 10 |
| C | 14 |
| D | 7 |
| E | 9 |
| F | 6 |
The sum of their scores is 50. This gives individual A 8% chance of being selected, individual B 20%, etc. We usually implement this with what is called "roulette wheel selection". Select a random number between 0 and 1. Then progressively add on the probabilities of each individual in order, until this sum is greater than the random number. For example, I randomly choose .77. This selects individual E, since 8 + 20 + 28 + 14 + 18 is 88%. With a random choice of .34, we select C, since 8 + 20 + 28 = 56%. To select the next generation, we would need to choose "mu" random numbers.
We see in proportional fitness, even the worst individual, A has a chance to reproduce, albeit only 8%. This will help prevent stagnation in the population.
So how is this different from proportional fitness? Now, A has a 1/36 chance of reproducing (the chance of choosing A for both sides of the competition), about 2.8%, while C, the most fit individual, has a 11/36 chance, or 30.6%. Tournament selection does not care about the spread of the scores, only the ranking. The nth ranked invididual in a population of size mu will have a (2mu - 2n + 1) / mu2 chance of reproducing. This puts an upper and lower bound on the chances of any individual to reproduce for the next generation. Tournament selection can be generalized to include more than 2 individuals being chosen for competition, and selecting the best from this group.
We select two individuals to be parents for the next generation, and choose some point along the vector, between 0 and the length of the vector. This will be our crossover point between the two parents. We swap information after the crossover point to make our two new children. For example, we have
1101101101 and 0001001000
as our two parents, and choose the crossover point to be after the 5th digit. Our two new children will be
11011 + 01000 and 00010 + 01101.
We can see that large chunks of each parent will survive to the next generation.
110 + 010 + 11 + 00 and 000 + 110 + 10 + 01
for the next generation.
Each new variable for the offspring is chosen randomly from each of the parent vectors. This works best when the variables are independent and therefore no relationship needs to survive to the next generation, only the values of the variables.