Next: Definitions Up: Cache Placement Methods Based Previous: Related Work

Topologically-guided Clustering

A study of sources and destinations of traffic in the Internet quickly becomes a search for a productive way to summarize large bodies of traffic into meaningful categories. Categorizations based on geography are natural and easy to place on world maps, but they are an increasingly inaccurate representation of the topology of the Internet. Our algorithm discovers the topology of the Internet by reading the best path data from BGP routing tables [26].

To forward a packet, one might think a router only needs to know which of its links to use for the next hop. A subsequent router will make decisions to get the packet even closer to its destination. Fortunately for us, BGP tables [26] contain a great deal of information about connections beyond the next hop. In the early days of Internet routing the designers wanted each BGP advertisement to contain the entire path of Autonomous Systems used to deliver a packet. This gives BGP routers full disclosure of the path their packets will take so the packets of one company (perhaps containing trade secrets or sensitive E-mail) would not pass through arch-enemy autonomous systems.

To make the algorithm manageable, we simplify the graph of AS connectivity into a tree. The 21 autonomous systems identified as being the exemplars of backbone clusters form the first level of the tree. As of 2001, a dozen of them are almost completely interconnected. Since the tree representation loses information about cross-links between branches of the tree, it is important that our algorithm minimize the impact on distance calculations using the trees.

Our work leverages the IP clustering work done by Krishnamurthy and Wang [15] showing how BGP routing tables can be used to gain 99 percent accuracy in partitioning IP addresses into non-overlapping groups. All IP addresses in a group are topologically close and under common administrative control. Their web client clustering paper shows other more involved techniques for gaining even higher accuracy and validating the results.

The basic unit of clustering used by our algorithm is the combination of all of the IP ranges that share a common AS number. Although clustering by AS is less specific than previous IP clustering studies, the IP addresses in our clusters still share common routings.

Subsections

Next: Definitions Up: Cache Placement Methods Based Previous: Related Work

Jim Gast 2001-04-20