A Genetic Algorithm For Cache Scheduling:
Can Machines Design Machines?



Submitted to OS-DIE 1
by
Michael Molla
May 12, 2000




Part 1

Abstract

The goal of this project is to find a genetic algorithm that will successfully evolve a useful cache scheduling algorithm. In doing so I hope to show that a genetic algorithm can do a good job of designing a part of an operating system or, more generally, that a machine can design part of another machine or even modify its own design.

In this paper, I will describe three experiments with genetic algorithms that attempt to do just that. I will also describe the results and show that these results suggest a possible viable scheme for computer evolution. This paper represents one step toward the ultimate goal of hands-off computer design.


Part 2

Introduction and Problem Description

The goal of this project is to get a Genetic Algorithm (GA) to evolve a reasonable cache scheduling algorithm. In order to be successful in this task, one first needs to understand both genetic algorithms and the problem of cache scheduling.

A GA consists of three things:

  1. a language with which to describe any policy in terms of a finite genome
  2. a way to evaluate the performance of these policies
  3. a way to modify the genome through mutation and recombination to allow evolution towards the goal from generation to generation

The language with which to describe the policies is usually the most complicated part of designing a GA. It is virtually problem specific and is almost always the difference between a working algorithm and no algorithm at all. Part 3 of this paper is devoted to the different languages I tried.

The way to evaluate performance is rather straight forward in this problem domain. A generation consists of one thousand cache requests. These requests come from a cache simulation that generates requests in loops, clustered accesses, one-pass sequential accesses and completely random accesses from a simulated 1000 page memory. These requests are randomly interleaved on a per-request basis. Each member of the population holds its own 100 frame cache and the performance of a member of the population is completely decided by its cache miss rate.

Genome modification is also rather straight forward. In each approach (as will be described in part 3) the genome is represented by one or more lists of numbers. Recombination, or mating, between two members of the population is achieved by randomly picking integers from the two parents to produce offspring. Mutation relies on two terms that I arrived at through trial and error. These are the mutation penetrance (25%) and the mutation rate (2%). The mutation penetrance indicates how likely an element in a genome is to change and the mutation rate indicates the possible magnitude of this change. The reason for the penetrance rate is that, without it, a low mutation rate caused a great deal of inbreeding and premature convergence and a high mutation rate caused no convergence at all.

In general, a cache scheduling policy is a way to decide, in the case that a process requests a page which is not presently in the cache, which cache frame should have its contents replaced. The optimal policy would be something like "longest time until next use gets replaced". In other words, the cache frame holding the page whose next use is farthest off should be replaced. Unfortunately, in the absence of knowledge of the future, this is impossible. So, decisions about cache replacement are based on the following two things:

  1. a memory of the behavior of the machine with respect to cache requests and
  2. a policy mapping memory states to cache replacements.
This policy is what the genetic algorithm will by trying to evolve.


Part 3

Methodologies

In approaching the problem, I made a number of simplifying assumptions that I know are, at best, approximations of the real world. Most of these are incorporated in the design of the process simulator that generates the page requests. For example, the fact that the simulator is running exactly one sequential, loop and clustered access at any given time is clearly not realistic. The underlying assumption is that this is not significantly less difficult than real world work. So, if it can learn the right thing to do in this system, it can learn the right thing to do in the real world.

The cache simulator assumes a, though configurable, static number of random, sequential, loop and clustered accesses randomly interspersed. The cache and data sizes are also configurable but all of the results presented are from a cache size of 100 and a total page space of 1000. I chose a ratio of 1 to 10 cache to memory for two reasons. It causes a large number of cache misses in even the best policy while keeping the math easy. I specifically chose 100 and 1000 because this is about as large as it could get and still be computationally tractable. The size of the history that I used to remember previous cache requests is 155 pages. It seems very arbitrary, but I got the number by observing LRU for a long time and observing the approximate greatest age of any page during that time. I wanted the algorithm to have at least enough information to make it possible to learn LRU.


Part 3.1

Approach #1

This was my first attempt at a genetic algorithm. The genome is array of 155 integers. Each one represents a position in the cache request history. Each cache frame finds all occurrences of the page he is currently holding in the request history. Then, if he finds any occurrences, he uses the time since each of these occurrences as indices into the genome and sums the entries in the genome that reside at these indices. This sum is the frame's "importance". The fame with the lowest importance has his contents replaced.

For example, imagine that page 44 is currently in the 18th cache frame when cache request comes in that cannot be filled by pages currently in the cache. Frame 18 goes to the history and finds that page 44 was requested 20 references ago and 32 references ago. So, frame 18 sums the 20th and 32nd elements from the genome. Say these are 162 and 212. So frame 18 has an importance of 162 + 212 = 374. Every frame in the cache does the same thing for their individual pages. The frame with the lowest importance has his contents replaced.

This method had good results which approached the performance of LRU and will be discussed in part 4 of this paper. One weakness I saw in this method was that the semantics were not powerful enough to encode anything but the process behavior with respect to the specific page in a given frame. It had no way to express a relation ship like "close to in memory".


Part 3.2

Approach #2

In order to solve the semantics problem that I saw with approach #1, I created approach #2. It is exactly the same as approach #1 with one addition. It evolves another array with length = total size (in pages) of the memory that the cache is caching for and containing numbers between 0 and 1.

Values from this array are multiplied by values in the first array and added to the frame totals based on the distance, in memory, between the page in the cache frame and the pages that have been requested according to the history.

If were were using Approach #2, in the previous example, we would have done exactly the same thing until the point where we did the sum to find the importance of the frame. Say we found that, in addition to the two reference that we found for page 44, page 41 was requested 12 references ago. Frame 18 would then get the 12th value from the first array, say 119, and multiply it by the 3rd element in the new array, say .627. This represents the multiplier for a page a distance of 3 pages away from the frame's current page in memory. This product would be added to the sum from the last example. So the total importance for the frame according to this approach is 374 + (119 * .627) = 449.

This approach also gave good results, but not significantly better than those of approach #1. The problem here seemed to be that the number of degrees of freedom was two high. The first array was still doing most of the work and the more successful policies' second arrays generally stayed out of the way. What I decided was that I needed a proximity factor that was more directed, not just a randomly evolved arbitrarily shaped distribution.


Part 3.3

Approach #3

The idea behind this approach is that, instead of trying to decide the importance of a previous cache request with respect to a nearby (distance in memory) current cache request based on an evolving set of essentially random values, maybe we can make a simplifying assumption about the importance of proximal cache requests. Specifically, this approach makes the assumption that the influence of a previous cache request on the occurrence of future proximal cache requests is normally distributed. So, very nearby requests are given a large weight and more distal requests are given smaller weights depending on how far away they are.

In this scheme, the scale of this normal curve and the influence of temporally distal requests are the values that evolve. The mathematical description of this procedure follows.

The formula for the influence of a single previous cache request is just the formula for the normal curve as follows:

f(w) = (e^(-(((w-mu)^2)/(2*(sigma^2)))))/ (sigma*((2*pi)^(1/2)))

where:

If there is a memory of "n" cache requests, representing memory locations c[1] through c[n] (note that some of these can be identical), then the "value" of the contents of the cache frame containing the page that came from the page m in memory is represented by g(n) where:

g(0) = 0 and
g(z) = g(z-1) * m[z-1] + f(c[z])

where m[] is an array of n numbers between 0 and 1 representing the "importance" of a cache request based on how long ago it happened. You can also think of this at the "rate of decay". This also evolves.

So, we simply run through the entire cache and whichever entry scores lowest by this scale gets replaced. In other words, this function represents an "importance landscape" with respect to the cache frames. The peaks are the most important frames according to this policy and the valleys are the least important. The cache frame at the bottom of the deepest valley is the one that will be replaced.

This process of building an importance landscape with respect to the cache frames is illustrated by the following example.

For simplicity, say we record only the previous 10 cache requests and use that information to make our determination. So n=10. We will also imagine that we have a memory of size = 100.

Imagine that n cache requests ago, there was a request for page number 44. So, from the last equation we know that

g(1) = g(0) * m[0] + f(c[1])

and, since q(0) = 0 and c[1] = 44, we get q(1) = f(44) or a normal curve around 44.


g(1)

x = page# in memory, y = importance.

Now say that the evolved value in the policy's m[1] is .51 and the next cache request after the one for page 44 was for page 17. So

g(2) = g(1) * m[1] + f(17)
= (the preceeding normal curve) * .51 + (another normal curve centered at 17)



g(2)

x = page# in memory, y = importance.

If we continue this process until we reach the most recent cache request, we produce the following landscape of importances of pages based on their location in memory:


g(n)

x = page# in memory, y = importance.

Each cache frame performs this function on the memory location of the page he is holding to decide how valuable this page is. The lowest scoring cache frame looses (is replaced).


part 4

results

I compared all three approaches to MRU, LRU and a policy of random replacement. I ran them on data whose loops, clusters and sequential accesses were significantly shorter than the cache size (up to 30 pages vs. 100 pages, respectively) and significantly longer (up to 150 pages vs. 100 pages, respectively). I also ran them on data comprised entirely of the four types of accesses described earlier as well as data that randomly interleaved the types.

In most cases, all three approaches tended to approach LRU and significantly outperform MRU and the random replacement policy (to see graphs of all of the runs, see the appendix (unless that counts toward the 12 page limit in which case please pretend that the appendix does not exits)).


25% random, 25% clustered, 25% sequential, 25% loops with clusters, sequences and loops up to 30 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3


x = generation# from 1 to 30, y = %cache misses (30% DOWN to 100%)


Not surprisingly, exceptions included the runs comprised entirely of random accesses and the runs comprised entirely of sequential accesses. All of the algorithms performed badly in these cases because they had no repetition from which to work.


100% one-pass sequential with sequences up to 30 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3


x = generation# from 1 to 24, y = %cache misses (30% DOWN to 100%)


The one very odd case was that of 100% large loops.


100% loops with loops up to 150 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3


x = generation# from 1 to 32, y = %cache misses (0% DOWN to 100%)


The reason that this one came out so strangely, I realized, is that it is possible for a single set of loops to overwhelm multiple generations. When the loop is in progress, it makes life easy for certain policies, but when it ends, it amounts to catastrophic change.

I also did some performance testing of the actual software that I wrote to perform these genetic algorithms. The goal of this testing was to help guide the decision of whether to deploy just the finished algorithm; deploy a constantly evolving algorithm; or something in between like an algorithm that would evolve in the background on stored-up data rather than real time.

I measured the real time taken by each algorithm running with the process simulator for a fixed number of cache requests, subtracted the time it takes for the cache simulator to generate that number of requests and divided this difference by the number of requests to get a per request time for each algorithm.

Not surprisingly, approach #3 took the longest at .0090 seconds per cache request. The fact that approach #2 took the second longest was not a surprise, but how close it was to the time taken by approach #3 was. At .0088 seconds per cache request, the performance of the two is not significantly different. It seems obvious that neither of these is a viable solution as they are presently written. Approach #1 does take a slightly more reasonable amount of time. At .0017 seconds per cache request however, it too is many optimizations from actually being affordable.

Obviously, the feasibility of each of the deployment depends on the speed of memory I/O versus the speed of the processor. On some machines the processor cost of any of these approaches is insignificant compared to the subsequent load of the page from memory. On others which are already CPU bound, any added CPU cost would be deadly.


Part 5

Conclusions

Over all, these are encouraging results. The fact that these GAs approach a known useful scheduling policy is a very good sign. Simple though LRU might be, the GA does not know that it is simple. It was told to find a good scheduling policy and it found one based only on environmental punishment and reward.

Of course this result does not, in itself, revolutionize OS design, cache management or the fundamental relationship between man and technology. It does, however, amount to a confirming instance of the theory that machines may someday take a more active role in their own design.


Part 6

Related Work

Though I have seen a great deal of work in the area of genetic algorithms that design OS policies; most have to do more with process management than with cache management. The language that I devise is also, to my knowledge, unique. These are all however, attempts at the same goal of using computers to do some of what are traditionally considered the inventive parts designing computers.


Part 7

Future Directions

One obvious next step would be to add states to this system. It would be very nice for the algorithm to be able to have some explicit concept of whether or not it is in a loop, for example. It would also be useful if the scheduler could be aware that it is having a streak of cache misses and that its idea of what the system is doing is no longer valid. If there were states, this could be accomplished by moving closer and closer to a "don't know" state with each consecutive cache miss.

Another idea is to attempt to make a simplified version of #3. I could approximate a normal curve, for example. I could also replace the "m[]" array with a single decay value "m". Another approach might be a hybrid between approaches 2 and 3 where the shape of a cache request's proximal influence does evolve but is either biased toward a curved shape or constrained to force closer values to be worth more and farther values worth less and only evolve within this constraint.

I would also like to take a much closer look at what it is that makes one representation converge more quickly or to a more accurate policy than another. As a "proof of concept" the measurements and trial and error in this paper are appropriate, but since these values should be tractable using the right statistical techniques, it would behoove one intending to deploy a system like this to find a mathematical basis to support conjectures about the performance before the fact of different representations.

I would also like to compare these genetic algorithms to other cache management algorithms such as Clock and DbMin[1].


References

Hong-Tai Chou, David J. DeWitt:"An Evaluation of Buffer Management Strategies for Relational Database Systems". VLDB 1985: 127-141

Mahesh C. Gupta, Yash P. Gupta and Anup Kumar; "Genetic algorithm application in a machine scheduling problem"; Proceedings of the conference on 1993 ACM computer science February 16 - 18, 1993, Indianapolis, IN USA

Steven J. Beaty; Proceedings of the 24th annual "Genetic algorithms and instruction scheduling; international symposium on Microarchitecture" , 1991, Pages 206 - 211 [ Find Related Articles ]

Yair Bartal, Amos Fiat, Howard Karloff and Rakesh Vohra; "New algorithms for an ancient scheduling problem"; Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, 1992, Pages 51 - 58

Arthur L. Corcoran and Roger L. Wainwright; "A genetic algorithm for packing in three dimensions"; Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing (vol. II): technological challenges of the 1990's ,

Allan Borodin, Nathan Linial and Michael E. Saks; "An optimal on-line algorithm for metrical task system"; J. ACM 39, 4 (Oct. 1992), Pages 745 -

Sajal K. Das, Sanjoy K. Sen and Rajeev Jayaram; "A dynamic load balancing strategy for channel assignment using selective borrowing in cellular mobile environment"; Proceedings of the second annual international conference on Mobile computing and networking , 1996, Pages 73 - 84

T. G. Bailey, K. Bauer and A. B. Marsh; "Response surface analysis of stochastic network performance"; Proceedings of the conference on Winter Simulation Conference , 1989, Pages 437 - 443

Uriel Feige and Christian Scheideler; "Improved bounds for acyclic job shop scheduling" (extended abstract) ; Proceedings of the thirtieth annual ACM symposium on Theory of computing , 1998, Pages 624 - 633

Bala Kalyanasundaram and Kirk R. Pruhs; "Fault-tolerant scheduling"; Proceedings of the twenty-sixth annual ACM symposium on Theory of computing , 1994, Pages 115 - 124


Appendix

More Graphs


Long term performance of approach #1:
blue = random policy
green = LRU policy
red = my genetic algorithm

x = generation# from 1 to 200, y = %cache misses (30% DOWN to 100%)
This graph represents the performance of the genetic algorithm versus the LRU and random algorithm.
25% random, 25% clustered, 25% sequential, 25% loops with clusters, sequences and loops up to 30 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 30, y = %cache misses (30% DOWN to 100%)

100% random

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 25, y = %cache misses (30% DOWN to 100%)

100% clustered with clusters up to 30 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 133, y = %cache misses (30% DOWN to 100%)

100% sequential with sequences up to 30 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 24, y = %cache misses (30% DOWN to 100%)

100% loops with loops up to 30 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 200, y = %cache misses (30% DOWN to 100%)

25% random, 25% clustered, 25% sequential, 25% loops with clusters, sequences and loops up to 150 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 27, y = %cache misses (30% DOWN to 100%)

100% random

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 25, y = %cache misses (30% DOWN to 100%)

100% clustered with clusters 150 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 46, y = %cache misses (30% DOWN to 100%)

100% sequential with sequences up to 150 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 26, y = %cache misses (30% DOWN to 100%)

100% loops with loops up to 150 pages long

red = random policy
yellow = LRU policy
black = MRU policy
white = attempt #1
green = attempt #2
blue = attempt #3

x = generation# from 1 to 32, y = %cache misses (0% DOWN to 100%)