CS 368-3 (2012 Summer) — Day 14 Homework

Due Monday, August 6, at the start of class.

Weather Forecast Analysis, Part III

The purpose of this three-part project is to compare weather forecasts against actual weather observations. In this third and final part, the goal is to do the comparisons and print a report of the findings.

Tasks

By now, you should have daily forecast data and hourly observation data from the past two assignments. If you need data to work with, you may download my files here: forecasts, observations.

Now, we want to write a new script that prints a report like this:

WEATHER REPORT

DATE               HIGHS                   LOWS (early next AM)
================   =====================   =====================
2011-07-29 (Fri)   87-89 => 86.5 (below)   61-63 => 71.2 (above)
2011-07-30 (Sat)   87-89 => 88.1 (good)    67-69 => 72.1 (above)
2011-07-31 (Sun)   87-89 => 89.1 (good)    67-69 => 74.7 (above)
2011-08-01 (Mon)   89-91 => 86.9 (below)   71-73 => 76.0 (above)
2011-08-02 (Tue)   91-93 => 90.5 (below)   67-69 => 72.6 (above)
2011-08-03 (Wed)   -------- only 7 weather observations --------
2011-08-04 (Thu)   ---------- no weather observations ----------

Of the 10 forecasts, the actual temperatures were below the
forecast 3 times, above the forecast 5 times, and accurate 2
times.	Overall accuracy was 20%.

Once again, you may start with a partial script, available here.

Background

Although we are starting with data that is much simpler than the original, source data, there are still some complications to deal with:

In the forecast data, each forecast is a string like “AROUND 70”, “LOWER 70S”, “MID 80S”, and “UPPER 80S”. But to compare our actual highs and lows to these forecasts, we must convert each text description into a numeric range — that is, a minimum and maximum forecast temperature. You may use any reasonable and consistent definition of the range for each word (i.e., AROUND, LOWER, MID, UPPER). Your numeric ranges may overlap or not, as you wish, but make sure that there are no gaps between ranges.
We have actual observations of temperatures (minimum and maximum) for each hour. But which observations should we examine for a given day’s forecast? For example, suppose we are analyzing the daily forecast issued at 6:47 a.m. on July 29, 2011, with a high “in the upper 80s” and a low “in the lower 60s”. When would that high occur? As it happened, the high was 86.5°F during the 6 p.m. hour. And the low? 71.2°F during the 5 a.m. on the next day, July 30. Having looked over a great deal of weather data, I made the design decision to evaluate actual observations for a day from noon on that day until the 11 a.m. hour on the following day. While not defensible in the long run, it is adequate for this exercise. If you use my starter script, you do not really have to worry about it too much, because I provide the subroutine that finds the actual high and low corresponding to each forecast date.
In the tabular part of the report, each row contains one cell for highs and one for lows. The format for each cell is identical. Hmmm… repeated output formats… what should we do about that?
The second part of the report is a short paragraph summarizing the results. Let’s imagine that we are going to email this text to someone, and so each line of text must be no longer than a given number characters. How do we do this, for all lines of the report?

Subroutines to Write

If you start with my partial script, here are the subroutines that you must write for credit on this assignment. Obviously, if you write your own script, you may organize it as you wish, and so this subsection is less relevant to you.

format_cell(). This subroutine takes a forecast string (e.g., “UPPER 80S”) and the corresponding actual observation (as a floating-point number), compares the two, and returns a formatted string for the cell in the tabular report. See the sample report above and the comment block preceding the subroutine for more examples. Use the compute_forecast_range() subroutine, defined below. ALSO, be sure to increment the global variables $below_count, $above_count, and $good_count as you compare forecasts and actuals, so that the analyze_counts() subroutine (below) has data to work with.
compute_forecast_range(): This subroutine calculates the numeric range of temperatures that correspond to the given forecast string. The comment block preceding that subroutine gives examples of the numeric ranges to use for each type of prediction. For example, we want to convert “UPPER 80S” to the pair of numbers (87, 89).
analyze_counts(): This subroutine examines the global variables $below_count, $above_count, and $good_count, and reports on the overall accuracy of the forecasts. It must produce a multiline string for the results, where the maximum line length for each line in the string is given as an argument to the subroutine. See the sample output above and the comment block preceding the subroutine for more examples of the format.

Optional Work

Write unit tests for the code. I tried to design most of the subroutines to be good for testing. And you have now seen the Test::More pattern a few times, so you should be able to write some tests.
If you want to make the assignment a bit more challenging, simply delete the contents of any other subroutines in the script, and try writing them yourself.
If you like, you are welcome to write the entire script on your own.

Reminders

Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

A printout of your code, ideally on a single sheet of paper. Be sure to put your own name in the initial comment block. Identifying your work is important, or you may not receive appropriate credit.