Due Monday, August 1, at the start of class.
The purpose of this three-part project is to compare weather forecasts against actual weather observations. In this first part, the goal is to create a script that downloads today’s weather forecast, extracts key data, and saves them in a simple text format.
Every day, sometimes several times during the day, there is a text weather forecast for Madison posted at the following URL:
http://www.ssec.wisc.edu/cgi-bin/lc?mad_for
The forecast is generally for the next several days. But, we are simply interested in the forecast for today and tonight. Here is a sample of the actual HTML that includes the forecast for one day:
<TD> <H1>Madison Forecast</H1> Local Madison Forecast 300 AM CDT THU JUL 29 2010 <br><br><font size=+1><B>TODAY...</B></font>SUNNY. HIGHS IN THE LOWER 80S. NORTHWEST WINDS UP TO 5 MPH. <br><br><font size=+1><B>TONIGHT...</B></font>PARTLY CLOUDY. LOWS AROUND 60. NORTHWEST WINDS UP TO 5 MPH THROUGH AROUND MIDNIGHT BECOMING CALM.
Note: The sample above does not include most of the actual HTML page downloaded from the URL above. There are many more lines in the real file!
The goal for today is to parse enough of the downloaded forecast to get a Unix time for the forecast time, and the textual forecast predictions for the high and low temperatures. Then, we save the data for later analysis.
If you like, you need only implement two subroutines in the larger script. That is, I have written the rest of the script (available here), and you can just fill in two critical parts related to today’s lecture.
Subroutine #1: convert_timestamp This function converts a
timestamp, in the text format used in the forecasts, to a standard Unix
(integer) timestamp. As described in the comment before the function
itself, the function should return the Unix timestamp; if something goes
wrong, it should return undef
. For example:
convert_timestamp('1034 AM CDT THU JUL 29 2010') => 1280417640
For testing and validations, you can use the Epoch Converter website to convert a regular date and time to a Unix timestamp. Beware: The main conversion form on that website expects times in GMT (UTC), not U.S. Central Daylight Time. So be sure to enter your times correctly adjusted for UTC, or use the later form that accepts an RFC 2822 formatted date and specify “CDT” as the timezone!
Subroutine #2: write_file This subroutine writes the given
string contents to the given file safely. Returns true upon
success or false otherwise. So basically, this is an extension of
the pattern shown in class. However, to get full credit,
you must integrate the tempfile()
function for creating and
opening your temporary output file; rename the temporary file to its final
name as shown. Also, do a good job of checking for errors throughout the
subroutine.
Note: The script includes the --test
option
and a few test cases. Be sure your convert_timestamp subroutine works with
the given tests. Consider adding more (see below).
The first and perhaps most obvious thing to do is add more unit tests to the script. Try lots of different cases. And anytime that you find something in the real, live downloaded data that breaks your script, reproduce the failure in a unit test first (before fixing the bug).
If you want to make the assignment a bit more challenging, simply delete the contents of any other subroutines in the script, and try writing them yourself. Use the unit tests to make sure you get back to success.
If you like, you are welcome to write the entire script on your own. It should download and parse the weather forecast, and write a text file with the extracted data. The file should be named:
wx-YYYY-MM-DD.txt
Where YYYY is the year, MM the month, and DD the date of the forecast timestamp in the downloaded data.
The file consists of a single line of text with four data fields, separated by tab characters. For example:
2011-07-29\t06:47\tUPPER 80S\tLOWER 60S\n
The date and time fields are from the forecast timestamp, and the high and low predictions are from the “TODAY” and “TONIGHT” sections of the forecast text.
Do the work yourself, consulting reasonable reference materials as needed; any reference material that gives you a complete or nearly complete solution to this problem or a similar one is not OK to use. Asking the instructors for help is OK, asking other students for help is not.
A printout of your complete script, ideally on a single sheet of paper printed on both sides; if you need to have multiple pages, that is OK. Be sure to put your own name in the initial comment block of the code. Identifying your work is important, or you may not receive appropriate credit.