Next: Benefits of Cache Sharing Up: Summary Cache: A Scalable Previous: Introduction

Traces and Simulations

**Table 1:** Statistics about the traces. The hit ratio and byte hit ratio are achieved under infinite cache.
Traces	DEC	UCB	UPisa	Questnet	NLANR
Time	8/29-9/4, 1996	9/14-9/19, 1996	Jan-March, 1997	1/15-1/21, 1998	12/22, 1997
Requests	3543968	1907762	2833624	2885285	1766409
Infinite Cache Size	2.88e+10	1.80e+10	2.07e+10	2.33e+10	1.37e+10
Maximum Hit Ratio	0.49	0.30	0.40	0.30	0.36
Maximum ByteHit Ratio	0.36	0.14	0.27	0.15	0.27
Client Population	10089	5780	2203	12	4
Client Groups	16	8	8	12	4

For our study we have collected five sets of traces of HTTP requests. The number of requests in each trace, the number of clients, and other statistics are listed in Table 1. In particular, Table 1 lists the ``infinite'' cache size for each trace, which is the total size in bytes of unique documents in the trace (i.e. the size of the ``infinite'' cache which incurs no cache replacement).

DEC traces [32]: Digital Equipment Corporation Web Proxy server traces, servicing about 17,000 workstations. The trace is for a period of 25 days (Aug. 29 to Sep. 21, 1996). We partitioned the trace into three one-week traces and one half-week traces. Our simulator can only simulate the subtraces due to swap-space limitations. In this paper, we present the results on the trace of the week of Aug. 29 to Sep. 4, 1996. Results on other traces are very similar.
UCB traces [24]: traces of HTTP requests gathered from the Home IP service offered by UC Berkeley to its students, faculty, and staff. The total trace is for a period of 18 days from Nov. 1 till Nov. 19, 1996, and is partitioned into four subtraces covering every four or five days. We present the results on the traces from Nov. 14 till Nov. 19. Though the trace originally records 2,468,890 requests, many of them have response data sizes of 0 or 1, and we decide to ignore those requests . Again, we have run the simulations on other traces in the UCB collections, and the results are similar to what are presented here.
UPisa traces [43]: traces of HTTP requests made by the users in Computer Science Department in Universita di Pisa, Italy, for a period of three months from January to March, 1997. Of the traces, we only simulate GET requests, and only those whose URLs do not include query strings, since most proxies do not cache query requests.
Questnet traces [47]: 7-days worth of logs of HTTP requests seen by the parent proxies at Questnet, which is a regional network in Australia, from Jan. 15 to Jan. 21, 1998. The proxies are parent proxies serving about 12 child proxies in the regional network. We extract successful GET requests seen by the parent proxies. Thus, the trace is only a subset of user requests going to the ten proxies. Unfortunately, the full set of user requests to the proxies are not available.
NLANR traces [40]: one-day log (Dec. 22, 1997) of HTTP requests to four major parent proxy caches in the National Caching hierarchy by NLANR (National Lab of Applied Network Research). There are about eight proxies in the National caching hierarchy, but only four of them ("bo", "pb", "sd", and "uc") handle documents from the servers in .com, .net, .edu, and other major domains. Thus, we decide to simulate requests to the four proxies only.

In our simulation of cache sharing, we partition the clients in DEC, UCB and UPisa into groups, assume that each group has its own proxy, and simulate the cache sharing among the proxies. This roughly corresponds to the scenario where each branch of a company or each department in a university has its own proxy cache, and the caches collaborate. We set the number of groups in DEC, UCB and UPisa traces to 16, 8 and 8, respectively. A client is put in a group if its clientID mod the group size equals the group ID. Though the simulation does not exactly correspond to reality, we believe it does bring insight on cache sharing protocols. Questnet traces contain HTTP requests coming from a set of child proxies in the regional network to the parent proxy. We assume that these are the requests going into the child proxies (since the child proxies send their cache misses to the parent proxy), and simulate cache sharing among the child proxies. Finally, NLANR traces contains actual HTTP requests going to the four major proxies, and we simulate the cache sharing among them.

In all our simulations, we use LRU as the cache replacement algorithm, with the restriction that documents larger than 250KB is not cached. The policy is similar to what are used in actual proxies. We do not simulate expiring documents based in age or time-to-live. Rather, most of our traces come with the last-modified time of a document for every request, and if a user request hit on a document whose last-modified time is changed, we count it as a cache miss. In other words, we assume that cache consistency mechanism is perfect. In practice, there are a variety of protocols [12,34,28] for Web cache consistency.

Most of our simulations assume a cache size that is 10% of the ``infinite'' cache size. Studies have shown that 10% of the ``infinite'' cache size typically achieves about 90% of the maximum cache hit ratio [49,8,35]. We also performed simulations with cache sizes being 5% of the infinite cache size and the results are very similar.

Next: Benefits of Cache Sharing Up: Summary Cache: A Scalable Previous: Introduction

Pei Cao
7/5/1998