Due to the explosive growth of the Web, Web proxy caching has recently received considerable attention. In particular, several studies have analyzed Web proxy traces and investigated the relationship between cache hit-ratio and cache sizes, the relationship between hit-ratio and the number of requests and the temporal locality of request streams [CI97,RV98,DMF97,GB97,KLM97]. Although various studies have used different set of traces, the following three properties have been identified:
An explanation for these phenomena, however, has eluded researchers. It is not clear whether these properties follow certain inherent characteristics of Web accesses or are simply an artifact of the collection of traces studied. The properties are useful for designing caching algorithms, configuring proxy caches, etc., and therefore it is important to understand them.
In this paper, we show that if one assumes that Web page requests are independent and the probability that a page is accessed follows Zipf's law then the three properties listed above all follow . Although the assumption that the requests are independent is traditionally considered an over-simplification, our results show that the model is powerful enough to explain the three properties as observed in real proxy traces.
The rest of the paper is organized as follows. Section 3 describes the Zipf model that is used in sections 4, 5 and 6 to develop expressions for cache hit-ratio and page request interarrival times. With the Zipf model in mind, we revisit Web cache replacement algorithms in section 7 where we compare the LRU and LFU page replacement policies. Finally, in section 8 we investigate how the hot set of a Web cache changes from day to day.