Surveyor SQL Tables

This document provides an overview of the data available in the Surveyor tables and how it relates to each other. Another (unfinished) document provides info on how these tables are generated from the raw data. This is the schema as is used by the DB/2 version of the surveyor data. David DeWitt and Jim Gray have been hacking it since.

I'm using the following conventions in this document:

The tables in the current schema are:

There is also a section listing the SQL types of the more important (cross table) named types.


Nodes

The Nodes table provides information about the terminal nodes that are the source and destination of Surveyor data. In the real world these correspond to ASNs, the Autonomous System Numbers for routable entities. The actual host at that entity which holds the Surveyor data collection point is relatively unimportant, but does provide the actual endpoint address.

There are approximately 60 nodes in the system. Current count is 66.

Node ID
The unique ID of this node.
IP Addr
The varchar IP address of the host node.
Name
The arbitrary name of this site and network connectivity. The name gives hints about the network that the node belongs to. XXX find a list of those names and what they mean.
Lat
Latitude of this Installation -- presumably what the GPS clock on the host node is recording.
Long
Longitude of the Installation -- See Latitude for details.
TimeZone
Which time zone this installation is in. Note that this is a varchar(3), it should probably be a varchar(5) (or larger) or a integer to reflect the positive/negative minutes from Zulu time, due to the problems with time zone naming and interpretation of what that zone really means.
Contact
ContactPhone
Unused contact info fields.
Comment
Text description of the site.
Version
??? Version number of software the site is using?

Node Pairs

Traffic is sent between a pair of nodes, an originating node and a destination node. This table is used to assign a unique ID to a particular combination of source and destination nodes.

In a perfect world there would be 7200 entries in this table, 60*60*2. However, not all the nodes were alive and sending/collecting data at the same time, so the actual count is less, 4290 or so.

Id
Node Pair ID of this particular source/destination pair.
Src Node ID
Node ID of the source node.
Dest Node ID
Node ID of the destination node.

Routes

A route is a unique path between a pair of source and destination nodes. This table assigns an arbitrary ID to that unique route so that it can be identified.

Current cardinalityof Routes is 1559226.

A route itself is described via Route Hops, and Routers.

Route ID
Unique ID of this route.
Node Pair ID
The Node Pair ID representing the source and destination of this route.

XXX One thing I believe which is missing from this table is the number of hops in the route..


Route Intervals

This table indicates an interval, a period of time, for which a particular route is valid. A route may be valid for multiple intervals. Each unique interval is assigned an ID so it can be refered to.

Cardinality is 16425766.

Id
Unique ID of a particular route interval.
Route ID
The Route ID of the route that packets are using in this interval.
Start Time
The unix time_t of when this route is first used in this interval.
End Time
The unix time_t when this route is last used in this interval.
Count
The number of consecutive traceroutes that used this route in this interval.
P Id
The Packet Type ID used in this interval.
Propagation Delay
??? Currently unused. Most likely the average of the RTTs that the traceroutes recorded.
Version
??? Version number of software the site is using?

Route Hops

A route is a sequence of hops that a packet takes to reach its destination. Each hop has a given its starting point, either the orginating node or the last router it visited.

Cardinality 36400124.

Primary Key
Route ID, Hop No
Route ID
The Route ID this hop belongs to.
Hop No
An ordinal representing which step in the route this hop is. The lowest Hop No is the first hop in the route from the source node, etc.
Router ID
The Router ID of the router reached by this hop.

Routers

A router is an entity that a packes passes through on the way from the source node to the destination node. This table is used to assign a ID to an IP address on a router.

Cardinality 31657.

Router ID
Unique ID of this router.
IP Addr
Internet Address of this router interface.
DNS name
Result of addr->name (PTR) lookup on the IP Addr of the router interface. These lookups occured historically -- when the route was active, since DNS mappings change over time.

One problem with this table is that an actual physical router can have several interfaces, each with its own address. There is no true mapping of unique routers, just router IP addresses.

Also note that due to DNS wackiness that the addr->name mapping can appear to change. For example, some routers appear to bounce back and forth between two DNS names on a regular basis. This is usually a symptom of an inconsistent set of DNS entries for a address on the authoritative DNS servers. However, choice of DNS server, DNS client location, changes propagating through the system, DNS caches, time-to-live values, and other such factors can affect this.


Times

The Times data represents each individual probes sent between a source and destination node.

Primary Key
RID, Start
Start
When this probe was sent. This is a GPS stamp expressed as a unix time_t.
Ticks
The 1/10000 of the second this probe was sent. 100 microsecond units, NOT micro-seconds.
Node Pair ID
The Node Pair ID representing the source and destination of the probe.
Delay
The time the probe was in-flight in microseconds.
RID
The Route ID of the route which probe packet followed from source to destination. This may not be exact -- the traceroute data is not synchronized with the probe data.

Packet Types

The intent of this thing was probably to keep track of multiple packet versions. The relation has only one entry at this time, since only one packet type was ever used.

PID
The unique Packet ID of this packet type. value=1
Type Desc
??? 40 char text field. value="Small"
Format
??? 10 char text field. value="IPPM"
BitWidth
??? int. value=320
Null String
??? int. value=0

Glossary of Type Names versus SQL Types

Start,Ticks in Times could be represented as a SQL timestamp. Unfortunately that would take up more space, and the table is huge already, so it is left as binary data.

Node ID
smallint
Node Pair ID
int everywhere except Times, where it is a smallint to help with alignment. It could be a smallint everywhere, not enough nodepairs yet
Route ID
int
Router ID
int
Hop No
smallint
Route Interval ID
int
IP Addr
varchar 15
DNS Name
varchar 80
Start
int, to match the unix time_t
Ticks
smallint
Delay
int
unix time_t
As we all recall, unix time_t is seconds past midnight Dec 31, 1969, or 0 == Jan 1, 1970, 00:00:00.

Bolo Documentation
Bolo's Home Page
Last Modified: Wed Dec 3 15:07:58 CST 2003
Bolo (Josef Burger) <bolo@cs.wisc.edu>