CS 368-3 (2012 Summer) — Day 14 Homework
Due Monday, August 6, at the start of class.
Weather Forecast Analysis, Part III
The purpose of this three-part project is to compare weather forecasts against actual weather observations. In this
third and final part, the goal is to do the comparisons and print a report of the findings.
Tasks
By now, you should have daily forecast data and hourly observation data from the past two assignments. If you need
data to work with, you may download my files here: forecasts,
observations.
Now, we want to write a new script that prints a report like this:
WEATHER REPORT
DATE HIGHS LOWS (early next AM)
================ ===================== =====================
2011-07-29 (Fri) 87-89 => 86.5 (below) 61-63 => 71.2 (above)
2011-07-30 (Sat) 87-89 => 88.1 (good) 67-69 => 72.1 (above)
2011-07-31 (Sun) 87-89 => 89.1 (good) 67-69 => 74.7 (above)
2011-08-01 (Mon) 89-91 => 86.9 (below) 71-73 => 76.0 (above)
2011-08-02 (Tue) 91-93 => 90.5 (below) 67-69 => 72.6 (above)
2011-08-03 (Wed) -------- only 7 weather observations --------
2011-08-04 (Thu) ---------- no weather observations ----------
Of the 10 forecasts, the actual temperatures were below the
forecast 3 times, above the forecast 5 times, and accurate 2
times. Overall accuracy was 20%.
Once again, you may start with a partial script, available here.
Background
Although we are starting with data that is much simpler than the original, source data, there are still some
complications to deal with:
-
In the forecast data, each forecast is a string like “AROUND 70”, “LOWER 70S”,
“MID 80S”, and “UPPER 80S”. But to compare our actual highs and lows to these
forecasts, we must convert each text description into a numeric range — that is, a minimum and maximum
forecast temperature. You may use any reasonable and consistent definition of the range for each word (i.e.,
AROUND, LOWER, MID, UPPER). Your numeric ranges may overlap or not, as you wish, but make sure that there are no
gaps between ranges.
-
We have actual observations of temperatures (minimum and maximum) for each hour. But which observations should we
examine for a given day’s forecast? For example, suppose we are analyzing the daily forecast issued at
6:47 a.m. on July 29, 2011, with a high “in the upper 80s” and a low “in the lower
60s”. When would that high occur? As it happened, the high was 86.5°F during the 6 p.m. hour.
And the low? 71.2°F during the 5 a.m. on the next day, July 30. Having looked over a
great deal of weather data, I made the design decision to evaluate actual observations for a day from noon on that
day until the 11 a.m. hour on the following day. While not defensible in the long run, it
is adequate for this exercise. If you use my starter script, you do not really have to worry about it too much,
because I provide the subroutine that finds the actual high and low corresponding to each forecast date.
-
In the tabular part of the report, each row contains one cell for highs and one for lows. The format for each
cell is identical. Hmmm… repeated output formats… what should we do about that?
-
The second part of the report is a short paragraph summarizing the results. Let’s imagine that we are going
to email this text to someone, and so each line of text must be no longer than a given number characters. How do
we do this, for all lines of the report?
Subroutines to Write
If you start with my partial script, here are the subroutines that you must write for credit on
this assignment. Obviously, if you write your own script, you may organize it as you wish, and so this subsection
is less relevant to you.
-
format_cell(). This subroutine takes a forecast string (e.g., “UPPER 80S”) and
the corresponding actual observation (as a floating-point number), compares the two, and returns a formatted
string for the cell in the tabular report. See the sample report above and the comment block preceding the
subroutine for more examples. Use the compute_forecast_range() subroutine, defined below. ALSO,
be sure to increment the global variables $below_count, $above_count, and $good_count as you compare forecasts and
actuals, so that the analyze_counts() subroutine (below) has data to work with.
-
compute_forecast_range(): This subroutine calculates the numeric range of temperatures that
correspond to the given forecast string. The comment block preceding that subroutine gives examples of the
numeric ranges to use for each type of prediction. For example, we want to convert “UPPER 80S”
to the pair of numbers (87, 89).
-
analyze_counts(): This subroutine examines the global variables $below_count, $above_count, and
$good_count, and reports on the overall accuracy of the forecasts. It must produce a multiline string for the
results, where the maximum line length for each line in the string is given as an argument to the subroutine. See
the sample output above and the comment block preceding the subroutine for more examples of the format.
Optional Work
-
Write unit tests for the code. I tried to design most of the subroutines to be good for testing. And you have
now seen the Test::More pattern a few times, so you should be able to write some tests.
-
If you want to make the assignment a bit more challenging, simply delete the contents of any other subroutines in
the script, and try writing them yourself.
-
If you like, you are welcome to write the entire script on your own.
Reminders
Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete
solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help
is OK, asking other students for help is not. All standard UW policies concerning
student conduct
(esp. UWS 14)
and information technology apply to this course and assignment.
Hand In
A printout of your code, ideally on a single sheet of paper. Be sure to put your
own name in the initial comment block. Identifying your work is important, or you may not receive appropriate
credit.