next up previous
Next: Conclusions and Future Work Up: Summary Cache: A Scalable Previous: Experiments

Related Work

Web caching is an active research area. There are many studies on Web client access characteristics [10,3,14,33,23], web caching algorithms [49,35,8] as well as Web cache consistency [28,31,34,13]. Our study does not address caching algorithms or cache consistency maintanence, but overlaps some of client traffic studies in our investigation of the benefits of Web cache sharing.

Recently, there have been a number of new cache sharing approaches proposed in the literature. The Cache Array Routing Protocol [46] divides URL-space among an array of loosely coupled proxy servers, and lets each proxy cache only the documents whose URLs are hashed to it. An advantage of the approach is that it eliminates duplicate copies of documents. However, it is not clear how well the approach performs for wide-area cache sharing, where proxies are distributed over a regional network. The Relais project [27] also proposes using local directories to find documents in other caches, and updating the directories asynchronously. The idea is similar to summary cache. However, the project does not seem to address the issue of memory demands. From the publications on Relais that we can find and read [4], it is also not clear to us whether the project addresses the issue of directory update frequencies. Proxies built out of tightly-coupled clustered workstations also use various hashing and partitioning approaches to utilize the memory and disks in the cluster [20], but the approaches are not appropriate in wide-area networks.

Our study is partially motivated by an existing proposal called directory server [21]. The approach uses a central server to keep track of the cache directories of all proxies, and all proxies query the server for cache hits in other proxies. The drawback of the approach is that the central server can easily become a bottleneck. The advantage is that little communication is needed between sibling proxies except for remote hits.

There have also been many studies on Web cache hierarchies and cache sharing. Hierarchical Web caching is first proposed in the Harvest project [26,12], which also introduces the ICP protocol. Currently, the Squid proxy server implements version 2 of the ICP protocol [48], upon which our Summary-Cached enhanced ICP is based. Adaptive Web caching [50] proposes a multicast-based adaptive caching infrastructure for document dissemination in the Web. In particular, the scheme seeks to position the documents at the right caches along the routes to the servers. Our study does not address the positioning issues. Rather, we note that our study is complimentary in the sense that the summary cache approach can be used as a mechanism for communicating caches' contents.

Though we did not simulate the scenario, Summary-Cache enhanced ICP can be used between parent and child proxies. Hierarchical Web caching includes not only cooperation among neighboring (sibling) proxies, but also parent and child proxies. The difference between a sibling proxy and a parent proxy is that a proxy cannot ask a sibling proxy to fetch a document from the server, but can ask a parent proxy to do so. Though our simulations only involve the cooperation among sibling proxies, the summary-cache approach can be used to propagate information about the parent cache's content to the child proxies, and eliminate the ICP queries from the child proxies to the parent. Our inspection of the Questnet traces shows that the child-to-parent ICP queries can be a significant portion (over 2/3) of the messages that the parent has to process.

In the operating system context, there have been a lot of studies on cooperative file caching [11,2] and the global memory system (GMS) [17]. The underlying assumption in these systems is that the high-speed local area networks are faster than disks, and workstations should use each other's idle memory to cache file pages or virtual memory pages to avoid traffic to disks. In this aspect, the problem is quite different from Web cache sharing. On the other hand, in both context there is the issue of how tightly coordinated the caches should be. Most cooperative file caching and GMS systems try to emulate the global LRU replacement algorithm, sometimes also using hints in doing so [45]. It is interesting to note that we arrive at quite different conclusions on whether global replacement algorithm is necessary [17]. The reason is that in the OS context, the global replacement algorithm is used for stealing memory from idle workstations (i.e. load-balancing the caches), while in Web cache sharing, every proxy is busy all the time. Thus, while simple cache sharing performs poorly in the OS context, it suffices for Web proxy cache sharing as long as each proxy's resource configuration is appropriate for its load. Finally, note that the technique of Bloom filter based summary cache is not restricted to the Web proxy caching context, but can be used where-ever the knowledge of other caches' contents is beneficial, for example, in caching and load-balancing in clustered servers.


next up previous
Next: Conclusions and Future Work Up: Summary Cache: A Scalable Previous: Experiments
Pei Cao
7/5/1998