How SC_IN_DBMS Works from A High Level Point of View

Step 1: Load Links

Each tuple (row) in the DT_RDWY_LINK table of WisTransportal STN database is loaded into the memory as a Link object (hereafter called a link for short) in the program. A link has the same basic properties as shown in Figure 1. Also, in the second step, all crashes occurred on the same link will form a list and be associated to that link. Additionally, depending on which algorithm is chosen in step 3, the highway name and direction information of each link might also be included as well.

	Basic Properties: link ID, link length (in ft), valid period (the time span during which the link is valid in the database), from-reference-site, and to-reference-site.
Figure 1 A typical Link object.

Step 2: Load Crashes

Each tuple (row) in a crash table of WisTransportal is loaded into the memory as a Crash object (hereafter referred to as a crash). Only crashes within the user-specified analysis period are loaded. A crash has some basic properties as shown in Figure 2. Crashes are not only loaded in a global list, but also added to the crash list of its containing link. So each link will have a list of crashes that happened in it. This list can be empty if there was no crash happening on this link during the user-specified analysis period.

	Basic Properties: accident number, accident time (date and time), link ID of the link on which the crash happened, and offset (in ft) from the from-reference-site of the crash containing link to the location of the crash.
Figure 2 A typical Crash object.

Step 3: Calculate Network Distances

A naive and inefficient way of calculating the network distances is to traverse the network once per combination of crash pair. There are several reasons why such process is inefficient. First and most obvious, in terms of time, some combinations of crash pair must NOT be a primary-secondary crash pair, since the first crash happened after the second crash. Second, even for some of those combinations valid in time, the exact distance between the two crashes does not need to be calculated. This is because the two links on which the crashes happened are so distant from each other that crashes happened in them have no chance to be closer than the distance threshold. Third and most important, for those combinations whose exact network distances need to be calculated, some traversals may be repeated many times for different combinations. For example, link 1 and link 2 are both 1 mile long and of the same direction. Link 2 starts at 20 mile upstream of link 1's from-reference-site. Crash A happened on link 1 with 0.5 mile offset. Crash B and C happened on link 2 with 0.2 mile offset and 0.4 mile offset, repectively. Crash B and C both happened after crash A. When calculating the distance between A and B, a traversal from link 2 to link 1 is performed and the program will find out the distance between the two links' from-reference-sites is 20 miles. Thus, the distance between crash A and B is 20 - 0.2 + 0.5 = 20.3 miles. When calculating the distance between crash A and C, one should notice that the previous traversing result of 20 mile end point distance can be reused since, if another traversal is performed, same links are involved. So the same traversal can be avoided and the distance can be quickly calculated as 20 - 0.4 + 0.5 = 20.1 miles. The difference is only between the offsets!

A more advanced approach is proposed to fix the above shortcomings. First of all, rather than directly dealing with the crashes, some sharable knowledge of the network (links) is precalculated. For each link with at least one crash happening on it (as a base link), a list of candidate links is prepared. A link is considered a candidate link if 1) at least one of the STS routes (see definition below) between its end reference sites and the base link's end reference sites is no larger than the distance filter and 2) there was at least one crash happened on the link during the user specified analysis period. Any link is its own candidate link. If a crash did not happen on a candidate link, its distance with any of the crashes on the base link is impossible to be within the distance filter and should NOT be considered. Besides the candidate links, another set of useful shared knowledge is some of the STS routes between the base link and the candidate links. The lengths of these STS routes can be reused when calculating the actual distance betwee the base link crashes and the candidate link crashes. For clarity, when we talk about candidate links and the STS routes, we are in the context of a particular base link. However, the set of STS routes for the same base link might be different under different algorithms for distance calculation.

A STS route is a sequence of connected links connecting one reference site (RS) to another. By saying connected links, no enforcement is made on following the directions of links. Two links are treated as connected if their from-reference-sites are the same, their to-reference-sites are the same, or one's from-reference-site is the other's to-reference-site. Since no direction is enforced, a STS route is different from some traditional notion of route. The reason why direction is not concerned for now is that only the route length will be used in future calculation. Although no direction is used, link distinction is enforced. In other words, a link cannot appear more than once in a STS route. Also, a STS route can be 0 ft long if its corresponding start RS and end RS are identical. Figure 3 shows some examples of STS routes and their basic properties:

	Basic Properties: the list of connected links (or empty), total length (in ft), which is the sum of all connected links' lengths (or 0 if the link list is empty), and the valid period, which is the intersection of all connected links' valid periods (or the analysis period if the link list is empty).
Figure 3 Illustration of STS routes.

Upon the successful preparation of candidate links and STS routes, calculation of network distances between crashes can be very efficient. Although three different algorithms have envolved during the implementation of the program, the high level work flow is identical and demonstrated in the following pseudo-code:

for every base link b{
	for every crash prim in base link b{
		for every candidate link of b, c{
			for every crash scnd in link c{
				if prim happened before scnd{
					calculate the network distance
					between prim and scnd
					using one of the following algorithms:
						1. One-way linear coordinate method
						2. Route_Link table method
						3. Two-way linear coordinate method
				}
			}
		}
	}
}