Due Thursday, August 2, at the start of class.
The purpose of this three-part project is to compare weather forecasts against actual weather observations. In this second part, the goal is to create a new, separate script to download, read, reduce, and save actual weather data as observed by the Rooftop Instrument Group (RIG) on top of the AO&SS building, 1225 West Dayton Street.
It is time to collect our actual weather observations. They are available from the following website:
The actual data are contained in subdirectories for each month. The actual data file URL looks like this:
Where YYYY is the four-digit year, MM is the two-digit month (e.g., 07 for July), and DD is the two-digit day of the month.
These files are large and contain too much data for our needs. We need a script to download a data file, reduce data, and save the results. That is, for each day we start with 17,279 lines of data like this:
And we want to save a data file with 24 hourly records like this:
2011-07-28 19 77.328 77.673
The input data are a mess. The first part of each record contains the date and time:
Dealing with the serial date is a pain, so we cheat: We assume that all of the records in a given file are in fact from the date contained in the filename. Thus, our script accepts a YYYY-MM-DD date as its argument, and uses that date throughout as the date.
The hours and minutes are very odd. The “0” in the example above means 0:00 or midnight. A “1” means 00:01, or one minute past midnight. “845” means 8:45 in the morning. “2359”, the last entry for each day, is 23:59, or one minute before midnight. You will have to write code to extract a meaningful hour and minute from this field.
Note: The date and time in the file are in UTC! Thus, even though the file starts at “midnight”, once the date/time is converted to Central (Daylight) Time, it really starts at 7 p.m. the day before. That is OK and expected, but make sure you handle the date and time correctly. Do not reinvent the wheel here!
The temperature observation is near the end of the record in the 24th field. It is in degrees Celsius, and so for our purposes, must be converted to degrees Fahrenheit.
In the script, the
analyze_rig_data() subroutine returns a reference to a Perl data structure. Look at
the picture while reading through the description.
The reference refers to a hash, which contains exactly 24 keys, one for each hour of data in the original RIG file. Each key is a Unix timestamp, and each associated value is a reference to another, tiny hash.
The tiny hash contains just two keys,
max. The values for those keys are the
minimum and maximum temperatures for that hour of data, in °F.
The output file contains 24 lines, one for each hour of the day. Due to the conversion from UTC to CDT, it starts at 7 p.m., or, hour = 19. There are four data fields, separated by the tab character:
2011-07-28 19 77.328 77.673 2011-07-28 20 76.672 77.509 2011-07-28 21 76.219 77.070 2011-07-28 22 75.207 77.342 2011-07-28 23 74.642 75.816 2011-07-29 00 73.634 75.241 2011-07-29 01 71.802 73.654 2011-07-29 02 70.851 71.897 2011-07-29 03 70.322 70.893 2011-07-29 04 70.302 70.921 2011-07-29 05 69.663 70.563 2011-07-29 06 69.721 70.313 2011-07-29 07 69.667 71.328 2011-07-29 08 70.927 72.795 2011-07-29 09 72.529 75.904 2011-07-29 10 75.132 78.602 2011-07-29 11 78.235 80.708 2011-07-29 12 79.743 82.024 2011-07-29 13 81.106 83.309 2011-07-29 14 82.072 84.358 2011-07-29 15 82.897 85.091 2011-07-29 16 83.165 85.206 2011-07-29 17 84.378 86.090 2011-07-29 18 84.020 86.500
If you like, you need only implement three subroutines in the larger script. The starting script is available here. The three subroutines are:
This subroutine analyzes a single line of AO&SS RIG data, extracts the time and temperature, converts both values to standardized units, and returns them. Remember that the time in the RIG data is for UTC, not local time. Also, the temperatures in the file are in Celsius, but the temperature part of the return value from this function should be in Fahrenheit.
For example, if the current line of the RIG input file is (middle section omitted):
Then this subroutine should return the list
(1311902611, 77.1512). That timestamp is
“2011-07-29 01:23:31” in UTC and “2011-07-28 20:23:31” in CDT.
extract_weather_data() subroutine is called once for each line in the RIG input data file. The
calling subroutine, which is named
analyze_rig_data(), is located immediately above it in the script.
This subroutine formats the reduced data records as a single string of 24 lines (one for each record), then calls the write_file() subroutine to actually save it to disk. The format of the output data file is described above.
Note: If the output data file already exists, this subroutine should simply return without doing anything.
This subroutine simply converts a temperature in degrees Celsius to one in degrees Fahrenheit. I gave you an easy one.
Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.
A printout of your code, ideally on a single sheet of paper. Be sure to put your own name in the initial comment block. Identifying your work is important, or you may not receive appropriate credit.