Cache Placement Methods Based on Client Demand Clustering

Paul Barford ¹, Jin-Yi Cai ² and Jim Gast ³
Computer Science Department
University of Wisconsin - Madison
1210 West Dayton Street
Madison, WI 53706

{barford,jyi,jgast}@cs.wisc.edu

One of the principal motivations for content delivery networks, such as those that have been deployed over the past three years, is to improve performance from the perspective of the client. Ideally, this is achieved by placing caches close to groups of clients and then routing client requests to the nearest cache. In the first part of this paper we present a new method for identifying regions of client demand. Our method uses best path information from Border Gateway Protocol (BGP) routing tables. It is based on clustering autonomous systems (AS's) such that the Hamming distance between the parent cluster and an AS under consideration for membership in that cluster is minimized. This method results in a simple tree of AS's where we define the root of the tree as the Internet backbone. This tree representation of AS connectivity is an idealization of the Internet's true structure and we show the impact in terms of AS hop distance from any AS in a cluster to the Internet backbone. One of the strengths of our AS clustering method is that it naturally lends itself to the cache placement problem. In the second part of this paper, we present two cache placement algorithms based on a tree graph of demand. The algorithms address the problems of placing single caches and multiple caches so as to minimize inter-AS traffic. We evaluate the effectiveness of our cache placement algorithms using Web server logs and show that they greatly improve performance over random cache placement.

Cache Placement Methods Based on Client Demand Clustering

Abstract: