Overhead of ICP

**Table 2:** Overhead of ICP in the four-proxy case. The SC-ICP protocol is introduced in Section 6 and will be explained later. The experiments are run three times, and the variance for each measurement is listed in the parenthesis. The overhead row lists the increase in percentage over no-ICP for each measurement. Note that in the synthetic experiments there is no inter-proxy cache hit.
Exp 1	Hit Ratio	Client Latency	User CPU	System CPU	UDP Msgs	TCP Msgs	Total Packets
no ICP	25%	2.75 (5%)	94.42 (5%)	133.65 (6%)	615 (28%)	334K (8%)	355K(7%)
ICP	25%	3.07 (0.7%)	116.87 (5%)	146.50 (5%)	54774 (0%)	328K (4%)	402K (3%)
Overhead		12%	24%	10%	9000%	2%	13%
SC-ICP	25%	2.85 (1%)	95.07 (6%)	134.61 (6%)	1079 (0%)	330K (5%)	351K (5%)
Overhead		4%	0.7%	0.7%	75%	-1%	-1%
Exp 2	Hit Ratio	Client Latency	User CPU	System CPU	UDP Msgs	TCP Msgs	Total Packets
no ICP	45%	2.21 (1%)	80.83 (2%)	111.10 (2%)	540 (3%)	272K (3%)	290K (3%)
ICP	45%	2.39 (1%)	97.36 (1%)	118.59 (1%)	39968 (0%)	257K (2%)	314K (1%)
Overhead		8%	20%	7%	7300%	-1%	8%
SC-ICP	45%	2.25 (1%)	82.03 (3%)	111.87 (3%)	799 (5%)	269K (5%)	287K (5%)
Overhead		2%	1%	1%	48%	-1%	-1%

The Internet Cache Protocol (ICP) [18] has been very successful at encouraging the practice of Web cache sharing around the world. It requires loose coordinations among the proxies, and is built on top of UDP for efficiency. It was designed by the Harvest research group [26] and supported by both the public-domain Squid [19] proxy software and some commercial products today. With the deployment of Squid proxies around the globe, ICP is widely used by international countries to reduce traffic over trans-Atlantic and trans-Pacific links.

Despite its success, ICP is not a scalable protocol. The problem is that ICP relies on queries to find remote cache hits. Every time one proxy has a cache miss, everyone else receives a query message and processes it. As the number of collaborating proxies increases, the overhead quickly becomes prohibitive.

To measure the overhead of ICP and its impact on proxy performance, we run experiments using a proxy benchmark designed by us [1]. (The benchmark has been submitted to SPEC as a candidate for the industry standard benchmark and is currently in-use at a number of proxy system vendors.) The benchmark consists of a collection of client processes that issue requests following patterns observed in real traces, including request size distribution and temporal locality, and a collection of server processes that delay the replies to emulate latencies in the Internet.

The experiments are performed on 10 Sun Sparc-20 workstations that are connected with 100Mb/s Ethernet. Four workstations act as four proxy systems, running Squid 1.1.14, and each having 75MB of cache space. The cache size is artificially small so that cache replacement occurs during the short duration of the experiments. Another four workstations run 120 client processes, 30 processes on each workstation. The client processes on each workstation connect to one of the proxies. Client processes issue requests with no thinking time in between, and the requested document size follow the Pareto distribution with $\alpha = 1.1$ and k = 3.0 [9]. Finally, two workstations act as servers, each with 15 servers listing on different ports. The Web server forks off a process when handling an HTTP request, and the process waits for 1 second before sending the reply to simulate the network latency.

We experiment with two different cache hit ratios, 25% and 45%, as the overhead of ICP varies with the cache miss ratio in each proxy. In the benchmark, the client issues requests following the temporal locality patterns observed in [35,8], and the inherent cache hit ratio in the request stream can be adjusted. In an experiment, each client process issues 200 requests, for a total of 24000 requests.

Using the benchmark, we compare two configurations: no-ICP, where proxies do not collaborate, and ICP, where proxies collaborate via ICP. Since we are only interested in the overhead, the requests issue by the clients do not overlap, and there is no remote cache hits among the proxies. This is the worst case scenario for ICP, and the results measure the overhead of the protocol. We use the same seeds in the random number generators for the no-ICP and ICP experiments to ensure comparable results, since otherwise the heavy-tailed document size distribution and our low request numbers lead to high variance. The relative difference between no-ICP and ICP are the same across different settings of seeds. We present results from one set of experiments here.

We measure the hit ratio in the caches, the average latency seen by the clients, the CPU time consumed by the Squid proxy in terms of user CPU time and system CPU time, and network traffic. Using netstat, we collect the number of UDP datagrams sent and received, the TCP packets sent and received, and the total number of IP packets handled by the Ethernet network interface. The third number is roughly the sum of the first two. The UDP traffic is incurred by the ICP query and reply messages. The TCP traffic include the HTTP traffic between the proxy and the servers, and between the proxy and the clients. The results are shown in Table 2.

The results show that ICP incurs considerable overhead even when the number of cooperating proxies is as low as four. The number of UDP messages is increased by a factor of 73 to 90. Due to the increase in the UDP messages, the total network traffic seen by the proxies are increased by 8% to 13%. Protocol processing increases the user CPU time by 20% to 24%, and UDP messages processing increases the system CPU time by 7% to 10%. Reflected to the clients, the average latency of an HTTP request is increased by 8% to 11%. The degradations occur despite the fact that the experiments are performed on a high-speed local area network.

The results highlight the dilemma faced by Web cache administrators. There are clear benefits of cache sharing, and yet the overhead of ICP is high. Furthermore, most of the time the processing of query message is wasted because the document is not cached. Essentially, the effort spent on processing ICP is proportional to the total number of cache misses experienced by other proxies, instead of proportional to the number of actual remote cache hits.

To address the problem, we propose a new scalable cache sharing protocol: Summary Cache.