Surveyor SQL Tables
This document provides an overview of the data available
in the Surveyor tables and how it relates to each other.
Another (unfinished) document provides info on
how these tables are generated
from the raw data.
This is the schema as is used by the DB/2 version of the surveyor data.
David DeWitt and Jim Gray have been hacking it since.
I'm using the following conventions in this document:
- I've strongly emphasized the name of the
Primary Key of each table.
- I've started emphasizing fields which are typed
to refer to keys of other tables.
- Field names are space-seperated to be easy to read.
Catenate them for the actual field name.
The tables in the current schema are:
There is also a section listing the SQL types
of the more important (cross table) named types.
The Nodes table provides information about
the terminal nodes that are the source and destination
of Surveyor data.
In the real world these correspond to ASNs, the
Autonomous System Numbers for routable
entities.
The actual host at that entity which holds the Surveyor
data collection point is relatively unimportant, but
does provide the actual endpoint address.
There are approximately 60 nodes in the system.
Current count is 66.
- Node ID
- The unique ID of this node.
- IP Addr
- The varchar IP address of the host node.
- Name
- The arbitrary name of this site and network connectivity.
The name gives hints about the network that the node belongs to.
XXX find a list of those names and what they mean.
- Lat
- Latitude of this Installation -- presumably what the
GPS clock on the host node is recording.
- Long
- Longitude of the Installation -- See Latitude for details.
- TimeZone
- Which time zone this installation is in.
Note that this is a varchar(3), it should probably be
a varchar(5) (or larger) or a integer to reflect the positive/negative
minutes from Zulu time, due to the problems with time zone naming
and interpretation of what that zone really means.
- Contact
- ContactPhone
- Unused contact info fields.
- Comment
- Text description of the site.
- Version
- ??? Version number of software the site is using?
Traffic is sent between a pair of nodes, an originating node
and a destination node.
This table is used to assign a unique ID to a particular combination
of source and destination nodes.
In a perfect world there would be 7200 entries in this table, 60*60*2.
However, not all the nodes were alive and sending/collecting data at
the same time, so the actual count is less, 4290 or so.
- Id
- Node Pair ID of this particular source/destination pair.
- Src Node ID
- Node ID of the source node.
- Dest Node ID
- Node ID of the destination node.
A route is a unique path between a pair of
source and destination nodes.
This table assigns an arbitrary ID to that unique route so
that it can be identified.
Current cardinalityof Routes is 1559226.
A route itself is described via Route Hops,
and Routers.
- Route ID
- Unique ID of this route.
- Node Pair ID
- The Node Pair ID representing the source and
destination of this route.
XXX One thing I believe which is missing from
this table is the number of hops in the route..
This table indicates an interval, a period of time, for which
a particular route is valid.
A route may be valid for multiple intervals.
Each unique interval is assigned an ID so it can be refered to.
Cardinality is 16425766.
- Id
- Unique ID of a particular route interval.
- Route ID
- The Route ID of the route that packets are using
in this interval.
- Start Time
- The unix time_t of when this route is first used in this interval.
- End Time
- The unix time_t when this route is last used in this interval.
- Count
- The number of consecutive traceroutes that used this
route in this interval.
- P Id
- The Packet Type ID used in this interval.
- Propagation Delay
- ??? Currently unused. Most likely the average of the RTTs
that the traceroutes recorded.
- Version
- ??? Version number of software the site is using?
A route is a sequence of hops that a packet takes
to reach its destination.
Each hop has a given its starting point, either the orginating
node or the last router it visited.
Cardinality 36400124.
- Primary Key
- Route ID, Hop No
- Route ID
- The Route ID this hop belongs to.
- Hop No
- An ordinal representing which step in the route this hop is.
The lowest Hop No is the first hop in the route from
the source node, etc.
- Router ID
- The Router ID of the router reached by this hop.
A router is an entity that a packes passes through
on the way from the source node to the destination node.
This table is used to assign a ID to an IP address on a router.
Cardinality 31657.
- Router ID
- Unique ID of this router.
- IP Addr
- Internet Address of this router interface.
- DNS name
- Result of addr->name (PTR) lookup on the IP Addr of the
router interface.
These lookups occured historically -- when the route was active,
since DNS mappings change over time.
One problem with this table is that an actual physical
router can have several interfaces, each with its own
address.
There is no true mapping of unique routers, just router IP addresses.
Also note that due to DNS wackiness that the addr->name mapping
can appear to change.
For example, some routers appear to bounce back and forth between
two DNS names on a regular basis.
This is usually a symptom of an inconsistent set of DNS entries
for a address on the authoritative DNS servers.
However, choice of DNS server, DNS client location, changes propagating
through the system, DNS caches, time-to-live
values, and other such factors can affect this.
The Times data represents each individual probes sent between
a source and destination node.
- Primary Key
- RID, Start
- Start
- When this probe was sent.
This is a GPS stamp expressed as a unix time_t.
- Ticks
- The 1/10000 of the second this probe was sent.
100 microsecond units, NOT micro-seconds.
- Node Pair ID
- The Node Pair ID representing the source and destination of
the probe.
- Delay
- The time the probe was in-flight in microseconds.
- RID
- The Route ID of the route which probe packet
followed from source to destination.
This may not be exact -- the traceroute data is not
synchronized with the probe data.
The intent of this thing was probably to keep track of multiple
packet versions.
The relation has only one entry at this time, since only one packet type
was ever used.
- PID
- The unique Packet ID of this packet type. value=1
- Type Desc
- ??? 40 char text field. value="Small"
- Format
- ??? 10 char text field. value="IPPM"
- BitWidth
- ??? int. value=320
- Null String
- ??? int. value=0
Start,Ticks in Times could be represented as a
SQL timestamp.
Unfortunately that would take up more space, and the table is
huge already, so it is left as binary data.
- Node ID
- smallint
- Node Pair ID
- int everywhere except Times, where it is a smallint to
help with alignment.
It could be a smallint everywhere, not enough nodepairs yet
- Route ID
- int
- Router ID
- int
- Hop No
- smallint
- Route Interval ID
- int
- IP Addr
- varchar 15
- DNS Name
- varchar 80
- Start
- int, to match the unix time_t
- Ticks
- smallint
- Delay
- int
- unix time_t
- As we all recall, unix time_t is seconds past midnight Dec 31, 1969,
or 0 == Jan 1, 1970, 00:00:00.
Bolo Documentation
Bolo's Home Page
Last Modified:
Wed Dec 3 15:07:58 CST 2003
Bolo (Josef Burger)
<bolo@cs.wisc.edu>