Next: Scalability
Up: Summary Cache
Previous: Bloom Filters as Summaries
Combining the above results, we recommend the following configuration
for the summary cache approach.
The update threshold should be between 1% and 10% to avoid significant
reduction of total cache hit ratio.
If a time-based update approach is chosen, the time interval should
be chosen such that the percentage of new documents is between 1% and 10%.
The proxy can either broadcast the changes (or the entire bit array if it
is smaller), or let other proxies fetch the updates from it.
The summary should be in the form of a Bloom filter. A load factor
between 8 and
16 works well, though proxies can lower or raise it depending on
their memory and network traffic concerns. Based on the load factor, four
or more hash functions should be used. The data provided here and
in [16] can be used as references in making the
decisions.
For hash functions, we recommend taking disjoint groups of bits from the
128-bit MD5 signature of the URL. If more bits are needed, one can calculate
the MD5 signature of the URL concatenated with itself.
In practice, the computational overhead of MD5 is negligible compared with the
user and system CPU overhead incurred by caching documents (see
Section 7).
Next: Scalability
Up: Summary Cache
Previous: Bloom Filters as Summaries
Pei Cao
7/5/1998