The client process varies the server_name, port_number and filenum of each request so that the request stream has a particular inherent hit ratio and follows the temporal locality pattern observed in most proxy traces. The client process sends requests in two stages. During the first stage, the client sends N requests, where N is the command line argument specifying the number of requests need to be sent. For each request, the client picks a random server, picks a random port at the server, and sends an HTTP request with the filenum increasing from 1 to N. Thus, during the first stage there is no cache hit in the request stream, since the file number increases from 1 to N. These requests serve to populate the cache, and also stress the cache replacement mechanisms in the proxy. The requests are all recorded in an array that is used in the second stage.
During the second stage, the client also sends N requests, but for each request, it picks a random number and takes different actions depending on the random number. If the number is higher than a certain constant, a new request is issued. If the number is lower than the constant, the client re-issues a request that it has issued before. Thus, the constant is the inherent hit ratio in the request stream. If the client needs to re-issue an old request, it chooses the request it issue t requests ago with probability proportional to . More specifically, the client program maintains the sum of for t from 1 to the number of requests issued (call it S). Everytime, it has to issue an old request, it picks a random number from 0 to 1 (call it r), calculates r*S, and chooses t where . In essence, t is chosen with probability .
The above temporal locality pattern is chosen based on a number of studies on the locality in Web access streams seen by the proxy. (We have inspected the locality curves of the requests generated by our code and found it to be similar to those obtained from traces. ) Note here that we only capture temporal locality, and do not model spatial locality at all. We plan to include spatial locality models when we have more information.
Finally, the inherent hit ratio in the second stage of requests can be specified in the configuration file. The default value is 50%.