Next: Effect of Adding Disk Up: Measuring Proxy Performance with Previous: Measuring the Behavior of

Performance of Example Systems

We used WPB to measure the performance of four proxy systems:

Apache version 1.3b2 - it is a multiprocess proxy that forks, at startup, a predefined number of processes to handle incoming requests. This number is dynamically adjusted as a response of the current load in the proxy. Apache stores cached documents in a four-level directory tree. However, the number of entries in each subdirectory as well as the size of directories in the same level are not fixed. Apache also copies documents to a temporary file in the root cache directory before copying it to a final location. Currently, we have not been able to find any information about how Apache finds this final location.
Cern version 3.0A - it is a multiprocess proxy that forks a new process to handle each request. After handling a single request, the new process terminates. It uses the file system to cache data (web documents) as well as proxy meta-data, such as expiration times, content type, etc. Web pages are stored in separate files and the directory structure is derived from the structure of the URL: each URL component is translated into a directory name. The directory path for a specific URL is called URL directory. Metadata for each file, used to find out whether it is stored in the cache, is kept in the corresponding URL directory.
A Cern-derived comercial proxy - This is a multiprocess proxy that at startup forks a predefined number of processes that are responsible for handling all the incoming requests. After a predefined number of requests serviced, a process terminates and gets respawned by a master process. Therefore, the total number of processes remains constant. For our experiments, we set the number of processes to 32, the default value. It uses the file system to cache data and proxy meta-data. The proxy cache is separated into one or more three-level cache sections. Each cache section contains 64 subdirectories and can hold 100 to 250 MB of data. It uses an algorithm to determine the directory where a document should be stored that tries to ensure equal dispersion of documents in the sections. The proxy uses the RSA MD5 algorithm to reduce a URL to 8 characters, which it uses for the file name of the document and to determine the storage directory. This proxy server will be identified as Proxy N.
Squid version 1.1.14 - it uses non-blocking network I/O abstractions in order to avoid forking new processes. Only one process handles all incoming requests. It manages its own resources and keeps meta-data about the cache contents in main memory. Therefore, it is not necessary to access disk to determine if a file is stored in the cache. Squid uses main memory to cache in-transit objects, to cache the most recently used objects and to cache error responses. The cache storage in disk is structured as a three-level cache: there are 16 sections, each one containing 256 subdirectories where files are stored. Squid maps URLs to cache object names using ``fingerprinting''. It implements its own DNS cache and uses a predefine number of ``DNS servers'' to which it sends non-blocking DNS requests

We have run a set of experiments using WPB (version 1.0) to analyse how the above systems perform under different loads. We varied the number of client processes and collected statistics for caching and no caching configurations. Before presenting the mains results obtained, we describe the hardware platform where the experiments were run.

We ran our experiments in a COW - Cluster of Workstatsions, that consists of forty Sun Sparcstations 20, each one with 2 66 Mhz CPUs, 64 MB of main memory and two 1 GB disk. The COW nodes are interconnected through different network interfaces, including a 100 Base-T interface, which we used during the experiments. We ran experiments varying the number of clients from 8 to 96, using four client machines and two server machines. We kept the ratio 4:1 between number of client processes and number of server processes. The cache size was set to 75 MB. The server latency is set to 3 seconds in all the experiments. The inherent hit ratio in the client request stream is the default value 50%. Thus, the maximum hit ratio that can be observed in the experiments (which is an average of the two phases) is around 25%.

**Figure 2:** Client Latency for CERN 3.0A.
$\begin{figure} \centerline{ \psfig {figure=graphs/cern-latency.ps,height=3in} }\end{figure}$

**Figure 3:** Hit Ratio for CERN 3.0A.
$\begin{figure} \centerline{ \psfig {figure=graphs/cern-hit.ps,height=3in} }\end{figure}$

**Figure 4:** Client Errors for CERN 3.0A.
$\begin{figure} \centerline{ \psfig {figure=graphs/cern-errors.ps,height=3in} }\end{figure}$

Figure 2 shows, for CERN 3.0A, how the average latency varies as a function of the number of clients in both caching and no caching configurations. For caching configuration, it also shows how the latency in the first and second phases of the experiment varies. For no caching, latency increases very slowly; for caching, however, it does increase with the number of clients. These curves show that, for our experiments, the cost of a hit may offset the cost of accessing the remote server if a great number of clients are concurrently trying to connect to the proxy. In other words, network transmission time is shorter than the time spent retrieving the file from disk. It is interesting to note that this is true despite the fact that we try to model transmission overhead by imposing a delay of 3 seconds in the server. For Cern, this limit is 32 clients; after this point, the second phase latency is bigger than the no caching average latency. Figure 3 shows the hit and byte hit ratios as a function of the number of clients. Both curves are very similar and, roughly, they show a decrease in the hit ratio. As the number of clients increases, the total number of unique files retrieved from the proxy also increases (since each client uses a different seed for the random number generator). As a consequence, hit ratio decreases. Figure 4 shows how CERN is effective in handling the incoming requests. As can be seen, after 40 concurrent clients, CERN is unable to handle all the connection requests that it receives. The number of errors increases as the number of clients increases. However, during the second phase of the caching experiment, when hits occur, no errors are observed

**Figure 5:** Hit Ratio for Apache 1.3b2.
$\begin{figure} \centerline{ \psfig {figure=graphs/apache-latency.ps,height=3in} }\end{figure}$

**Figure 6:** Hit Ratio for Apache 1.3b2.
$\begin{figure} \centerline{ \psfig {figure=graphs/apache-hit.ps,height=3in} }\end{figure}$

Figure 5 shows how latency varies for Apache. Although the curves are quite unstable, it is clear that latency for the caching experiments increases as a function of the number of clients. However, latency remains roughly constant when caching is disabled. Latency for the second phase of the caching experiments remains shorter than the no caching latency up to 72 clients. After this point, it increases significantly. The degradation of Apache performance as the number of clients increases is also noteworthy. The latency increases up to almost 30 seconds for 88 clients when no hit is observed. This probably is a consequence of the two-phase write that may involve several memory operations. Figure 6 shows the hit ratio curves for Apache. It seems like hit ratio in Apache is not significantly affected by the number of clients. We are currently unable to explain Apache's behavior. No errors were observed during the experiments.

**Figure 7:** Hit Ratio for Proxy N.
$\begin{figure} \centerline{ \psfig {figure=graphs/netscape-latency.ps,height=3in} }\end{figure}$

**Figure 8:** Hit Ratio for Proxy N.
$\begin{figure} \centerline{ \psfig {figure=graphs/netscape-hit.ps,height=3in} }\end{figure}$

**Figure 9:** Client Error for Proxy N.
$\begin{figure} \centerline{ \psfig {figure=graphs/netscape-errors.ps,height=3in} }\end{figure}$

Figure 7 shows the latency curves for Proxy N. The average latency for no caching experiments increases linearly with the number of clients for this proxy. Also, the second phase latency for the caching experiments is always lower than the no caching latency. This may be due either to a better usage of disk resources or a more expensive implementation. Since latency increases linearly with the number of clients even when caching is disabled, we conjecture that overhead of proxy implementation is responsible for this behavior. One explanation is that because the number of proxy processes is always the same, as the number of client increases, more requests must be delayed waiting for available processes to handle them. This is true for both caching and no caching experiments. Figure 8 shows hit ratio curves for Proxy N. Hit ratio degrades very slowly. The only proxy that has a better hit ratio is squid, as will be shown next. This may be due to different algorithms for cleaning the cache. Figure 9 shows that the fixed number of process implementation results in a high number of errors during the experiments, especially when there is no hit in the cache. With 32 processes, the proxy is unable to handle all the requests if there are more than 48 clients. If the number of proxy processes is inceased to 64, the overall behavior is the same, but the saturation point is shifted to 88 clients. So, this parameter must be carefully tuned in order to minimize the number of errors. As a consequence, proxy N may have problems in handling bursty traffic since this parameter must be statically chosen.

**Figure 10:** Hit Ratio for Squid 1.1.14.
$\begin{figure} \centerline{ \psfig {figure=graphs/squid-latency.ps,height=3in} }\end{figure}$

**Figure 11:** Hit Ratio for Squid 1.1.14.
$\begin{figure} \centerline{ \psfig {figure=graphs/squid-hit.ps,height=3in} }\end{figure}$

Figure 10 shows the latency curves for Squid. These curves have a behavior similar to those for Apache and Cern. However, Cern has a slightly better average latency for caching experiments. These results are consistent to those presented in [9]. When caching is disabled, Squid performs better. Figure 11 shows the hit ratio curves. Byte hit ratio is very unstable but it can be seen that Squid can sustain a fairly constant hit ratio, independent of the number of clients. We are currently trying to investigate this behavior with more details.

Next: Effect of Adding Disk Up: Measuring Proxy Performance with Previous: Measuring the Behavior of

Pei Cao
4/13/1998