Lee Breslau
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304 USA
breslau@parc.xerox.com
-
Pei Cao
University of Wisconsin, Madison
1210 West Dayton Street
Madison, WI 53706 USA
cao@cs.wisc.edu
-
Li Fan
University of Wisconsin, Madison
1210 West Dayton Street
Madison, WI 53706 USA
lfan@cs.wisc.edu
-
Graham Phillips
University of Southern California
University Park
Los Angeles, CA 90089 USA
graham@catarina.usc.edu
-
Scott Shenker
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304 USA
shenker@parc.xerox.com
Recently, a number of studies on characteristics of Web proxy traces have shown that the hit-ratios of the traces exhibit certain properties that are uniform across the different sets of the traces [CI97,RV98,DMF97,GB97,KLM97]. An explanation for these phenomena has eluded researchers and it is not clear whether the properties are inherent to Web accesses or particular to the set of traces studied.
In this paper, we show that if one assumes that the references in the Web access stream are independent and the reference probability of the documents follow Zipf's law then the observed properties follow from Zipf's law. We revisit Web cache replacement algorithms and show that the algorithm that is suggested by Zipf's law performs best. Finally, we investigate the drift in the cache's hot set as a function of time.