To forward a packet, one might think a router only needs to know which of its links to use for the next hop. A subsequent router will make decisions to get the packet even closer to its destination. Fortunately for us, BGP tables [26] contain a great deal of information about connections beyond the next hop. In the early days of Internet routing the designers wanted each BGP advertisement to contain the entire path of Autonomous Systems used to deliver a packet. This gives BGP routers full disclosure of the path their packets will take so the packets of one company (perhaps containing trade secrets or sensitive E-mail) would not pass through arch-enemy autonomous systems.
To make the algorithm manageable, we simplify the graph of AS connectivity into a tree. The 21 autonomous systems identified as being the exemplars of backbone clusters form the first level of the tree. As of 2001, a dozen of them are almost completely interconnected. Since the tree representation loses information about cross-links between branches of the tree, it is important that our algorithm minimize the impact on distance calculations using the trees.
Our work leverages the IP clustering work done by Krishnamurthy and Wang [15] showing how BGP routing tables can be used to gain 99 percent accuracy in partitioning IP addresses into non-overlapping groups. All IP addresses in a group are topologically close and under common administrative control. Their web client clustering paper shows other more involved techniques for gaining even higher accuracy and validating the results.
The basic unit of clustering used by our algorithm is the combination of all of the IP ranges that share a common AS number. Although clustering by AS is less specific than previous IP clustering studies, the IP addresses in our clusters still share common routings.