next up previous
Next: The Model Up: On the Implications of Previous: Introduction

Trace Characteristics

A number of existing publications point out the applicability of Zipf's law to Web accesses. Early research includes [Gla94] and [ABCdO96]. [Gla94] studied Web accesses of about 60 users for a period of weeks, and found that the accesses follow the 1/i distribution closely. [ABCdO96] also studied Web accesses of about 200 users in a department, and collected Web accesses that are not filtered by the browser cache. They also found that the accesses follows 1/(i0.97), which is close to 1/i.

Recently, [Kim98] reports that on a large proxy trace, they found that Web accesses follows the Zipf's distribution, but with 1/(i0.67). That is, the alpha value tends to be 0.67, instead of 1.

Since we also have accesses to five large proxy traces, we studies whether they follow Zipf's distribution as well. Our conclusions are as follows:

These observations have implications on Web caching algorithms and Web cache consistency maintainence. The fact that hot documents are not necessarily small means that an algorithm that always replaces the largest document may not work well. The fact that hot documents can change often means that any cache consistency scheme should not assume that the more popular a document gets, the more stable it is. Finally, the fact that hot documents are more or less evenly distributed across hot Web servers implies that there is no one Web server that overly dominates on the Web, but rather, a collection of Web servers absorb a large percentage of the Web traffic.

Graphs supporting the claims.


next up previous
Next: The Model Up: On the Implications of Previous: Introduction
Pei Cao
6/2/1998