Clustering is performed by successive passes through the graph building up large clusters by visiting small clusters and merging them into an existing larger cluster.
For each clustering pass, each node, , without a parent (i.e.
) tries to find a
suitable parent. Conceptually, the candidate parents
are the nodes which dominate it,
. In practice, this is too strict
a requirement and we will define
more correctly below. Now, find the nearest
among the candidate parents,
.
The best parent is
An interesting design decision happens in situations where , neither neighbor list is a proper
superset of the other and neither dominates. We defined domination in this way so both nodes are free
to become siblings under some other parent, keeping the tree comparatively shallow. If
or
had been
arbitrarily chosen as parent, the other (and her subtree) would appear to be one hop farther from the backbone.
It would also have been meaningful to define the best parent as the farthest candidate parent. This would cause AS's to choose AS's with very high out-degree as their preferred parent. The result would have been a shallower tree that more closely matches the distance to the backbone, but it also would have lost the useful categorization of AS's into clusters with very similar sets of neighbors.
In practice,
many AS's connect to more than one major provider. These AS's are not strictly dominated by any one of the
nodes they have links to.
To relax the domination requirement, a tolerance
factor grows with each pass through the unassigned nodes. The tolerance, , allows a node to
become a child of any node with a higher out-degree if the overhang is less than the current tolerance.
So the actual computation for the set of candidate parents is: