So far we have assumed that the page request distribution is stationary, that is, the distribution does not change with time. In practice this assumption is an over-simplification because the probabilities of reference for some documents decrease while others increase. The model indicates that a small percentage of documents (the hot set) will be responsible for a large percentage of Web requests. In this section we study how the hot set in the request stream changes with time by analyzing the proxy traces.
We took the portion of DEC traces and looked at the most popular 600 URLs in each day. The one-week DEC trace from September 16 to September 22, 1996, has 4.5 million requests. In each day, the most popular 600 URLs accounts for over 10% of the total requests. After obtaining the hottest 600 URLs for each day, we then determined the number of documents that remain in the hot set for each consecutive day. In other words, for each day N, we looked at the size of the intersection between the hot set during day N and those for days N-1, N-2 and N-3. The results are shown as three bars in Figure 5.
The figure shows that about two thirds of the hot set remain unchanged over time. The exception is September 21 and September 22, which are a Saturday and a Sunday. The hot sets change by more than half compared to those of the working days. Looking at the intersection of hot sets during day N and day N-2, as well as day N and day N-3, we see that the overlap of hot documents decreases as the time interval increases. However, even after three days, about 60% of popular documents continue to be requested during the working days and about 30% of the popular documents remain popular on weekends. Thus, a significant portion of the hot documents appear quite stable.
We are still studying other traces and investigating how their hot set changes with time. We are also trying to understand the implications of hot set drifts to Web proxies, and in particular, the implication to prefetching and multicast delivery of Web pages.