Using Two-Way Linear Coordinate to Find Distances between Crashes

0 - Motivation

In a previous research stage, a one-way linear coordinate method (OWLC) was proposed and implemented in SC_IN_DBMS to enable fast calculation of network distance between any two crashes. During a test, a major problem of OWLC was found. The problem is, when the base link and the candidate link are connected in a loop (directinal or indirectional), OWLC may misplace the candidate link in the linear coordinate system originated from the base link. For example, in Figure 1, the base link is F_b->T_b and the candidate link is F_c->T_c. A primary crash p happened on the based link and was 0.7 miles from F_b; a potential secondary crash s happened on the candidate link with a 0.6 mile offset from F_c. The base link and the candidate link are 1 mile and 1.4 mile long, respectively. There is a 12.5-mile route between T_c and F_b and a 11.7-mile route between T_b and F_c. In this example, OWLC would first calculate the shortest indirectional distances between T_c/F_c and T_b/F_b. The shortest distances are going to be dist_{F_c-F_b}=12.7 miles, dist_{T_c-F_b}=12.5 miles, dist_{F_c-T_b}=11.7 miles, and dist_{T_c-T_b}=13.1 miles. From this point on, OWLC only sees the four distances as if the points (reference sites) are orderly lined up along the X axis. Thus, F_c will be considered downstream of the base link since dist_{F_c-F_b} > dist_{F_c-T_b} and dist_{F_c-F_b} > length of the base link. Similarly, T_c will be considered upstream of the base link. In the OWLC view, the candidate link is skewed without being noticed. The incorrect layout of the candidate link consequentially causes the misplacement of the potential secondary crash s, and hence a wrong distance between the two crashes.

Figure 1. An example in which OWLC makes an error.

The essence of the above skew problem is that the shortest distances between the reference sites are either backward measured or forward measured with respect to the base link, but not consistent when being compared to each other. For example, OWLC compares dist_{T_c-F_b} with dist_{T_c-T_b} without noticing the former is measured through the backward route and the latter is measured through the forward route. So, to fix this problem, a two-way linear coordinate method (TWLC) is proposed below with dual shortest distances measured backward and forward, respectively.

1 - Terminologies and Definitions

In the following discussion, we assume the readers already read the SC_IN_DBMS framework and are familiar with some basic terms, such as base link, candidate link, STS route, etc. Also, without particular specification, F_b is used to denote the from-reference-site of a base link, and T_b for the to-reference-site. Additionally, RS is used to denote any reference site in the STN network, and can be substituded with any other notation of a reference site. The following two parallel concepts are the most important and elementary terms in the successive discussion:

Forward STS route to F_b (FRT_{F_b-RS}): A FRT_{F_b-RS} is a STS route between F_b and RS that starts with the base link F_b->T_b and never goes through F_b or T_b again. If F_b is an inevitable reference site between T_b and RS, then no FRT_{F_b-RS} exists (as null in the program), since F_b must be passed again from link F_b->T_b to RS. A special case is when RS = F_b. In this case, an exception is made that FRT_{F_b-F_b} is F_b itself with length 0.
Backward STS routes to F_b (BRT_{F_b-RS}): A BRT_{F_b-RS} is a STS route between F_b and RS that never goes through T_b. If T_b is an inevitable reference site between F_b and RS, then no BRT_{F_b-RS} exists. For example, BRT_{F_b-T_b} does not exist (as null in the program).

2 - Algorithm Modules

The TWLC method consists of three algorithm modules, each fitting into a certain level of process in the framework:

for every base link b{
	Module 1: Traverse to Collect Shortest STS Routes and Identify Candidate Links
	for every crash prim in base link b{
		for every candidate link of b, c{
			for every crash scnd in link c{
				if prim happened before scnd{
					Module 2: Calculate the Network Distance
					if b and c are on opposite directions of the same highway{
						Module 3: Validation Using Highway Name and Direction Information
					}
				}
			}
		}
	}
}

When reading the following details about the three modules, one should always keep in mind the right contexts of the modules as they fit into the framework. For example, the explanation of Module 1 is in a per-base-link context. Every task in Module 1 should be redone for every new base link. Also, the results from Module 1 are only associated with the corresponding base link.

2.1 - Module 1: Traverse to Collect Shortest STS Routes and Identify Candidate Links

The traversing goes two ways: forward and backward. The purpose of the forward traversal is to reach every RS whose shortest FRT_{F_b-RS} is no longer than the distance filter + 2*the maximum link length and store all these RS's with their shortest FRT_{F_b-RS} in a hashtable. Similarly, a backward traversal is to reach every RS whose shortest BRT_{F_b-RS} is no longer than the distance filter + 2*the maximum link length and store all these RS's with their shortest BRT_{F_b-RS} in anther hashtable. During the traversals, candidate links can be identified on the fly. Below are some variables used in the following pseudo code of the traversing algorithm:

workingList: a FIFO queue each of whose elements is a data pair (RS, STS route).
FRTTable: a hashtable with key = RS and value = STS route.
BRTTable: a hashtable with key = RS and value = STS route.
candLinks: a list of all candidate links.
df: the distance filter.
mll: the maximum link length.

Forward traversal pseudo code:

allocate workingList;
allocate FRTTable;

// Initialize FRTTable
hashPair = [k: F_b | v: an empty STS route()];
add hashPair to FRTTable;
hashPair = [k: T_b | v: a STS route() containing only the base link];
add hashPair to FRTTable;

// Initialize workingList
dataPair = [v1: T_b | v2: a STS route() containing only the base link];
add dataPair to workingList;

// Traverse until the end of the growing workingList is reached
while workingList is not empty{

	dataPair = workingList gets the first element;
	curRS = dataPair.v1;
	curRoute = dataPair.v2;
	workingList removes the first element;
	
	nextLinks = a set of links connected to curRS;
	for each link nextLink in nextLinks{
	
		if nextLink is not the last link in curRoute{
			
			nextRS = the other RS of nextLink instead of curRS;
			
			if nextRS is neither F_b nor T_b{
				
				nextRoute = curRoute;
				nextRoute appends nextLink to the end;
				
				if nextRoute <= df+2*mll {
					
					if nextLink is not yet in candLinks, add it to candLinks;
					
					if FRTTable contains nextRS as a key{
						tmpRoute = FRTTable.valueOfKey(nextRS);
						if nextRoute < tmpRoute{
							FRTTable.valueOfKey(nextRS) = nextRoute;
							dataPair = [v1: nextRS | v2: nextRoute];
							workingList appends dataPair to the end;
						}
					}
					else{
						hashPair = [k: nextRS | v: nextRoute];
						dataPair = [v1: nextRS | v2: nextRoute];
						FRTTable puts hashPair in;
						workingList appends dataPair to the end;
					}
				}
			}
		}
	}	
}

Backward traversal pseudo code:

allocate workingList;
allocate BRTTable;

// Initialize BRTTable
hashPair = [k: F_b | v: an empty STS route()];
add hashPair to BRTTable;

// Initialize workingList
dataPair = [v1: F_b | v2: an empty STS route()];
add dataPair to workingList;

// Traverse until the end of the growing workingList is reached
while workingList is not empty{

	dataPair = workingList gets the first element;
	curRS = dataPair.v1;
	curRoute = dataPair.v2;
	workingList removes the first element;
	
	nextLinks = a set of links connected to curRS;
	for each link nextLink in nextLinks{
	
		if nextLink is not the last link in curRoute{
			
			nextRS = the other RS of nextLink instead of curRS;
			
			if nextRS is neither F_b nor T_b{
				
				nextRoute = curRoute;
				nextRoute appends nextLink to the end;
				
				if nextRoute <= df+2*mll {
					
					if nextLink is not yet in candLinks, add it to candLinks;
					
					if BRTTable contains nextRS as a key{
						tmpRoute = BRTTable.valueOfKey(nextRS);
						if nextRoute < tmpRoute{
							BRTTable.valueOfKey(nextRS) = nextRoute;
							dataPair = [v1: nextRS | v2: nextRoute];
							workingList appends dataPair to the end;
						}
					}
					else{
						hashPair = [k: nextRS | v: nextRoute];
						dataPair = [v1: nextRS | v2: nextRoute];
						BRTTable puts hashPair in;
						workingList appends dataPair to the end;
					}
				}
			}
		}
	}	
}

During the above two traversals, candidate links are identified on the fly. A link is considered to be a candidate link if: 1) it is reached as the nextLink and by appending it the nextRoute is no longer than the distance filter + 2*the maximum link length and 2) it contains at least one crash during the analysis period. Candidate links identified through this criterion are actually a superset of the real candidate links defined in the framework. But fortunately, the number of false candidate links is very small and sometimes can be 0. The convenience of the identification criterion outweights the extra calculation brought by the small number of false candidate links, and the correctness of the final result is reserved.

2.2 - Module 2: Calculate the Network Distance

Module 2 deals with the calculation of the network distance between two specific crashes, the prim (on the base link) and the scnd (on the candidate link). The first step is to locate all reference sites on the linear coordinate system (X axis). The X coordinates of the four reference sites are formulated below in Table 1. There is a forward coordinate and a backward coordinate for both F_c and T_c, definied by the shortest FRT's and BRT's. The shortest FRT's and BRT's can be looked up in constant time (including no result found) using FRTTable and BRTTable generated in Module 1. If a FRT (or BRT) is not available, the corresponding coordinate is null in the program.

Table 1. Tow-Way Coordinates of Reference Sites

Reference Site	Backward Coordinate	Forward Coordinate
Base Link
F_b	0
T_b	length_{base link}
Candidate Link
F_c	F_c^b = -the shortest BRT_{F_b-F_c} (or null if the BRT doesn't exist)	F_c^f = the shortest FRT_{F_b-F_c} (or null if the FRT doesn't exist)
T_c	T_c^b = -the shortest BRT_{F_b-T_c} (or null if the BRT doesn't exist)	T_c^f = the shortest FRT_{F_b-T_c} (or null if the FRT doesn't exist)

Next, the coordinates of crashes are to be determined. Depending on whether some coordinates of the reference sites are available, there can be a couple alternative coordinates for the secondary crash scnd. The formulas are given in Table 2.

Table 2. Two-Way Coordinates of Crashes

Crash	Backward Coordinate	Forward Coordinate
Base Link
prim	p = offset_prim
Candidate Link
scnd	if \|T_c^b - F_c^b\| == length of the candidate link s^b = sign(T_c^b - F_c^b)*offset_scnd + F_c^b	if \|T_c^f - F_c^f\| == length of the candidate link s^f = sign(T_c^f - F_c^f)*offset_scnd + F_c^f
scnd	else, s^b = - min(\|F_c^b\| + offset_scnd, \|T_c^b\| + length_{base link} - offset_scnd)	else, s^f = min(\|F_c^f\| + offset_scnd, \|T_c^f\| + length_{base link} - offset_scnd)

At last, the distance between the two crashes is determined through the following equation:

Dist_prim-scnd = min(|s^b - p|, |s^f - p|)

.......... Equation 1

If Dist_prim-scnd = |s^dir - p| (where, dir = b or f), whether scnd is upstream or downstream of prim is its own direction can be determined through the following sign equation. If Sign_prim-scnd > 0, scnd is downstream of prim in the direction of the candidate link; if Sign_prim-scnd < 0, scnd is downstream of prim in the direction of the candidate link; otherwise, scnd and prim overlap. If either of T_c^dir and F_c^dir is null, Sign_prim-scnd is made positive.

Sign_prim-scnd = sign(s^dir - p)*sign(T_c^dir - F_c^dir)

.......... Equation 2

Figure 2 illustrates how TWLC solves the problem mentioned at the beggining of the document. In this case, the final distance is min(dist^forward, dist^backward) = 12.6 miles, which is the correct result.

Figure 2. An example in which TWCL fixes the problem.

2.3 - Module 3: Validation Using Highway Name and Direction Information

Although TWLC solves the problem when highway loops are involved, there is another problem remaining if the highway name and the highway direction of each link are not known. Figure 3 illustrates this problem. Highway STH 90 goes EW, and there is a segment AB of 13.3 mile long. A->B and B->A are divided by rail median barriers and are parallel to each other with almost the same shape. The base link F_b->T_b is part of of A->B (EB) while the candidate link F_c->T_c is part of B->A (WB). There is no shared reference site by both directions between A and B. Based on such setup, the physical measure of the distance between crash s and crash p should be the difference between the distance from s to A (or B) and the distance from p to A (or B), which is 0.4 miles. However, without extra information, TWLC will treat such parallel chains connected at two ends as a one-way highway loop. Thus, TWLC will come up with a linear coordinate layout identical to that in Figure 2, resulting in an unrealistic distance of 12.6 miles. As a result, a validation step is needed to deal with the parallel chains problem (or alternating reference site problem as previously called during the program development).

Figure 3. An example showing the problem of not knowing the highway name and direction information.

The proposed validation process requires highway name and direction information of each link. In Step 1 of the program framework, links are loaded from the DT_RDWY_LINK table with some basic properties. Two additional properties need to be loaded from a table named DT_RDWY_RTE_LINK. The two properties are highway name and highway direction. Highway name is the name of the highway which the link belongs to. Hightway direction is the traffic direction of the link on that highway. For example, in Figure 3, the highway name of the base link and the candidate link are both "STH 90", but the highway direction of the base link is "E" and the highway direction of the candidate link is "W". The validation (Module 3) is activated if the base link and the candidate link have the same highway name but different highway directions (i.e., on opposite directions of the same highway).

The basic idea of the validation process is to find the first shared reference site of both highway directions upstream and downstream of the base link (or downstream and upstream of the candidate link), respectively. The difference between the distances from crash prim and crash scnd to the first shared reference site upstream of the base link (or downstream of the candidate link) is noted diff_up. Similarly, the difference between the distances from crash prim and crash scnd to the first shared reference site downstream of the base link (or upstream of the candidate link) is noted diff_down. The validated distance between the two crashes is found to be min(diff_up, diff_down). If the validated distance is shorter than the distance obtained through Module 2, the final result is the validated distance.