Traces | DEC | UCB | UPisa | Questnet | NLANR | |||
Time | 8/29-9/4, 1996 | 9/14-9/19, 1996 | Jan-March, 1997 | 1/15-1/21, 1998 | 12/22, 1997 | |||
Requests | 3543968 | 1907762 | 2833624 | 2885285 | 1766409 | |||
Infinite Cache Size | 2.88e+10 | 1.80e+10 | 2.07e+10 | 2.33e+10 | 1.37e+10 | |||
Maximum Hit Ratio | 0.49 | 0.30 | 0.40 | 0.30 | 0.36 | |||
Maximum ByteHit Ratio | 0.36 | 0.14 | 0.27 | 0.15 | 0.27 | |||
Client Population | 10089 | 5780 | 2203 | 12 | 4 | |||
Client Groups | 16 | 8 | 8 | 12 | 4 |
For our study we have collected five sets of traces of HTTP requests. The number of requests in each trace, the number of clients, and other statistics are listed in Table 1. In particular, Table 1 lists the ``infinite'' cache size for each trace, which is the total size in bytes of unique documents in the trace (i.e. the size of the ``infinite'' cache which incurs no cache replacement).
In our simulation of cache sharing, we partition the clients in DEC, UCB and UPisa into groups, assume that each group has its own proxy, and simulate the cache sharing among the proxies. This roughly corresponds to the scenario where each branch of a company or each department in a university has its own proxy cache, and the caches collaborate. We set the number of groups in DEC, UCB and UPisa traces to 16, 8 and 8, respectively. A client is put in a group if its clientID mod the group size equals the group ID. Though the simulation does not exactly correspond to reality, we believe it does bring insight on cache sharing protocols. Questnet traces contain HTTP requests coming from a set of child proxies in the regional network to the parent proxy. We assume that these are the requests going into the child proxies (since the child proxies send their cache misses to the parent proxy), and simulate cache sharing among the child proxies. Finally, NLANR traces contains actual HTTP requests going to the four major proxies, and we simulate the cache sharing among them.
In all our simulations, we use LRU as the cache replacement algorithm, with the restriction that documents larger than 250KB is not cached. The policy is similar to what are used in actual proxies. We do not simulate expiring documents based in age or time-to-live. Rather, most of our traces come with the last-modified time of a document for every request, and if a user request hit on a document whose last-modified time is changed, we count it as a cache miss. In other words, we assume that cache consistency mechanism is perfect. In practice, there are a variety of protocols [12,34,28] for Web cache consistency.
Most of our simulations assume a cache size that is 10% of the ``infinite'' cache size. Studies have shown that 10% of the ``infinite'' cache size typically achieves about 90% of the maximum cache hit ratio [49,8,35]. We also performed simulations with cache sizes being 5% of the infinite cache size and the results are very similar.