New Technology for Year 2000 Problem Tools:

The Use of Program Profiling for Year 2000 Renovation and Testing

Thomas Reps
Computer Sciences Department
University of Wisconsin
1210 West Dayton Street
Madison, WI 53706
reps at cs.wisc.edu
http://www.cs.wisc.edu/~reps/

"The world faces cataclysmic breakdown at the turn of the millennium!"

While this alarm may be old news to anyone who was present at the turn of the last millennium, there are significant reasons for residents of the (first) world to be concerned this time around: Because many computer programs use only two digits to record year values in date-valued data, they may process a year value of 00 as 1900 in cases where 2000 was intended. If the intended value is 2000 --- such as when 00 represents the value of the current year in a computation performed after the calendar rolls over on January 1, 2000 --- then a faulty computation may be carried out. For example, if the (approximate) age of someone born in 1956 were calculated for January 1, 2000, he would appear to be 00 - 56 = -56 years old! Because computations can involve dates in the future, the phenomenon can occur well before the calendar rolls over on January 1, 2000.

Last summer, I was asked by the Defense Advanced Research Projects Agency (DARPA) to help them plan a project aimed at reducing the impact of the Year 2000 (Y2K) problem on the Department of Defense. DARPA was particularly interested in whether there were "any techniques in the research community that could be applied to the Y2K problem and have impact beyond present commercial Y2K products and services". The most exciting of the ideas that turned up concerns a method for using path profiling as a heuristic to locate some of the sites in a program where there are problematic date manipulations. It works as follows:

In path profiling, a program is instrumented so that the number of times each different loop-free path executes is accumulated during an execution run. With such an instrumented program, each run (or set of runs) of the program generates a path spectrum for the execution --- a distribution of the paths that were executed. Path spectra can be used to identify paths in a program that are good candidates for being date-dependent computations by finding differences between path spectra from execution runs on pre-year-2000 data and post-year-2000 data. By choosing input datasets to hold all factors constant except the way dates are used in the program, any differences in the spectra obtained from different execution runs can be attributed to date-dependent computations in the program. Differences in the spectra reveal paths along which the program performed a new sort of computation during the post-year-2000 run, as well as paths --- and hence computations --- that were no longer executed during the post-year-2000 run.
With some further analysis of the spectra, for each such path that shows up in the spectral difference, it is possible to identify the shortest prefix that distinguishes it from all of the paths in the other path set (see below).

For the Y2K problem, the path-spectrum comparison technique may provide help with two aspect of the problem:

Of course, the path-spectrum comparison technique is not guaranteed to uncover all sites of date manipulations. No technique can do this; all one can hope for are good heuristics. However, because path-spectrum comparison involves a different principle from the principles that lie behind the heuristics used in commercial Y2K tools, it should be a good complement to current techniques.

Furthermore, the path-spectrum comparison technique is actually applicable to a much wider range of software-maintenance problems than just the Y2K problem; it offers new perspectives on program testing, on the task of creating test data, and on what tools can be created to support program testing.

A prototype tool for gathering and comparing path spectra, called DynaDiff, has been built at the University of Wisconsin. (DynaDiff works on programs that run under Solaris on Sun SPARCstations.)

Although most UNIX programs do not have a Y2K problem, they do have a ``Year 2038'' problem. (The UNIX time function, which reports the number of seconds since January 1, 1970, rolls over from 2**31-1 (i.e., 01111111111111111111111111111111) to 2**31 (i.e., 10000000000000000000000000000000) --- and thus turns negative --- on Tuesday, January 19, 2038 at 03:14:08 UTC.) Thus, we can demonstrate the principle of using the path-spectrum comparison technique for diagnosing possible Y2K problems by applying DynaDiff to compare spectra generated from normal runs against spectra generated from runs in which the result of time has been ``time-warped'' into the future. For example, the UNIX cal utility (when called with no arguments) prints the calendar for the current month, where the ``current month'' is obtained by calling time. Below is an example of the spectral differences that DynaDiff displays (for a version of cal modified to optionally permit ``time-warping'') when it compares a spectrum from a run from February 1997 against a run made with time set to emulate a run in February 2039 (i.e., post-rollover):

We see that the path-spectrum comparison technique does indeed detect that the two runs of the program had different behaviors.

Using DynaDiff, the user is able to examine the paths in question by clicking on any of the bars that appear in the two spectra. This brings up a window that displays the elements of the corresponding path in contrasting colors. For example, one of the paths executed in the year-2039 run --- but not in the year-1997 run --- would be displayed as follows:

Green indicates the longest prefix that the path shares with some path in the year-1997 path set; yellow indicates the endpoints of the first edge that distinguishes the path from all of the paths in the year-1997 path set; red indicates the remainder of the path.

The colors shown above indicate that the year-2039 run executed the loop

    while (rem >= SECSPERDAY) {
            rem -= SECSPERDAY;
            ++days;
    }
at least once, and the year-1997 run did not execute this loop at all. Examination of the code allows us to deduce that the value of *timep must be negative. The program renovator could then use this information (by employing program slicing, for example) to trace back to the source of the problem.

This work on path-spectrum comparison is described in the following paper: Reps, T., Ball, T., Das, M., and Larus, J., " The use of program profiling for software maintenance with applications to the Year 2000 Problem," To appear in Proceedings of ESEC/FSE '97: Sixth European Software Engineering Conference and Fifth ACM SIGSOFT Symposium on the Foundations of Software Engineering, (Zurich, Switzerland, Sept. 22-25, 1997), Lecture Notes in Computer Science, Springer-Verlag, New York, NY, 1997.