Computer Sciences Department logo

CS 368-1 (2011 Summer) — Day 13 Homework

Due Tuesday, August 2, at the start of class.

Weather Forecast Analysis, Part II

The purpose of this four-part project is to compare weather forecasts against actual weather observations. In this second part, the goal is to create a new, separate script to download, read, reduce, and save actual weather data as observed by the Rooftop Instrument Group (RIG) on top of the AO&SS building, 1225 West Dayton Street.

Details

It is time to collect our actual weather observations. They are available from the following website:

http://metobs.ssec.wisc.edu/pub/cache/aoss/tower/

The actual data are contained in subdirectories for each month. The actual data file URL looks like this:

http://metobs.ssec.wisc.edu/pub/cache/aoss/tower/ascii/YYYY/MM/rig_tower.YYYY-MM-DD.ascii

Where YYYY is the four-digit year, MM is the two-digit month (e.g., 07 for July), and DD is the two-digit day of the month.

These files are large and contain too much data for our needs. We need a script to download a data file, reduce data, and save the results. That is, for each day we start with 17,279 lines of data like this:

1,2011,210,0,1.9,973.29,5.8559,30.075,27.609,973.94,58732,51.201,26.33,25.625,25.429,80.313,3.5021,197.11,142.4,89.91,25.535,23.595,142.4,25.279,36.461,0,.28,29.911

And we want to save a data file with 24 hourly records like this:

2011-07-28	19	77.328	77.673

Input File Notes

The input data are a mess. The first part of each record contains the date and time:

1,2011,210,0,1.9,...

Dealing with the serial date is a pain, so we cheat: We assume that all of the records in a given file are in fact from the date contained in the filename. Thus, our script accepts a YYYY-MM-DD date as its argument, and uses that date throughout as the date.

The hours and minutes are very odd. The “0” in the example above means 0:00 or midnight. A “1” means 00:01, or one minute past midnight. “845” means 8:45 in the morning. “2359”, the last entry for each day, is 23:59, or one minute before midnight. You will have to write code to extract a meaningful hour and minute from this field.

Note: The date and time in the file are in UTC! Thus, even though the file starts at “midnight”, once the date/time is converted to Central (Daylight) Time, it really starts at 7 p.m. the day before. But that is OK.

The temperature observation is near the end of the record in the 24th field. It is in degrees Celsius, and so for our purposes, must be converted to degrees Fahrenheit.

Data Structure Notes

In the script, the analyze_rig_data() subroutine returns a reference to a Perl data structure. Look at the picture while reading through the description.

The reference refers to a hash, which contains exactly 24 keys, one for each hour of data in the original RIG file. Each key is a Unix timestamp, and each associated value is a reference to another, tiny hash.

The tiny hash contains just two keys, min and max. The values for those keys are the minimum and maximum temperatures for that hour of data, in °F.

Output File Notes

The output file contains 24 lines, one for each hour of the day. Due to the conversion from UTC to CDT, it starts at 7 p.m., or, hour = 19. There are four data fields, separated by the tab character:

2011-07-28	19	77.328	77.673
2011-07-28	20	76.672	77.509
2011-07-28	21	76.219	77.070
2011-07-28	22	75.207	77.342
2011-07-28	23	74.642	75.816
2011-07-29	00	73.634	75.241
2011-07-29	01	71.802	73.654
2011-07-29	02	70.851	71.897
2011-07-29	03	70.322	70.893
2011-07-29	04	70.302	70.921
2011-07-29	05	69.663	70.563
2011-07-29	06	69.721	70.313
2011-07-29	07	69.667	71.328
2011-07-29	08	70.927	72.795
2011-07-29	09	72.529	75.904
2011-07-29	10	75.132	78.602
2011-07-29	11	78.235	80.708
2011-07-29	12	79.743	82.024
2011-07-29	13	81.106	83.309
2011-07-29	14	82.072	84.358
2011-07-29	15	82.897	85.091
2011-07-29	16	83.165	85.206
2011-07-29	17	84.378	86.090
2011-07-29	18	84.020	86.500

Required Work

If you like, you need only implement three subroutines in the larger script. The starting script is available here. The three subroutines are:

extract_weather_data()

This subroutine analyzes a single line of AO&SS RIG data, extracts the time and temperature, converts both values to standardized units, and returns them. Remember that the time in the RIG data is for UTC, not local time. Also, the temperatures in the file are in Celsius, but the temperature part of the return value from this function should be in Fahrenheit.

For example, if the current line of the RIG input file is (middle section omitted):

1,2011,210,123,31.9,974.22,...,142.4,25.084,2.4397,0,0,29.938

Then this subroutine should return the list (1311902611, 77.1512). That timestamp is “2011-07-29 01:23:31” in UTC and “2011-07-28 20:23:31” in CDT.

The extract_weather_data() subroutine is called once for each line in the RIG input data file. The calling subroutine, which is named analyze_rig_data(), is located immediately above it in the script.

save_data()

This subroutine formats the reduced data records as a single string of 24 lines (one for each record), then calls the write_file() subroutine to actually save it to disk. The format of the output data file is described above.

Note: If the output data file already exists, this subroutine should simply return without doing anything.

c2f()

This subroutine simply converts a temperature in degrees Celsius to one in degrees Fahrenheit. I gave you an easy one.

Optional Work

Reminders

Do the work yourself, consulting reasonable reference materials as needed; any reference material that gives you a complete or nearly complete solution to this problem or a similar one is not OK to use. Asking the instructors for help is OK, asking other students for help is not.

Hand In

A printout of your complete script, ideally on a single sheet of paper printed on both sides; if you need to have multiple pages, that is OK. Be sure to put your own name in the initial comment block of the code. Identifying your work is important, or you may not receive appropriate credit.