CS 368-3 (2012 Summer) — Day 8 Homework

Due Monday, July 16, at the start of class.

Goal

Read two input data files containing data about the countries of the world and store the data into useful data structures. Then, ask for user input and print a simple report based on the stored data.

Tasks

Your script should be divided into two main parts:

Reading and storing the country data
Reporting on that data

Part I: Reading and Storing Data

There are two separate data files containing country data. The first input file (213 lines, 13 KB) contains basic information about each country; here are the first few lines of the file:

ABW : Aruba : Aruba : Latin America & Caribbean
ADO : Andorra : Principality of Andorra : Europe & Central Asia
AFG : Afghanistan : Islamic State of Afghanistan : South Asia
AGO : Angola : People's Republic of Angola : Sub-Saharan Africa
ALB : Albania : Republic of Albania : Europe & Central Asia

That is, there are 4 data fields, separated by the string “ : ” (space - colon - space):

Country code
Short country name
Long country name
Geographic region

The second input file (9960 lines, 206 KB) contains country population data, one line per country per year; here are the first few lines of the file:

ABW : 1960 : 49205
ABW : 1961 : 50244
ABW : 1962 : 51258
ABW : 1963 : 52224
ABW : 1964 : 53117

There are 3 data fields, separated by the string “ : ” (space - colon - space):

Country code
Year
Population

Read both data files and store the data in a single, complex, well-considered data structure. Your script must read and store all of the data, even if you do not need it all for Part II. Some hints:

To parse each input line, use the split function:
```
my @line_elements = split(' : ', $input_line);
```
You have probably done so already, but you can read about split() in Perldoc.
This is real data, so it is a bit messy. There are “extra” countries in the population file. Only store data for the countries that occur in both files (so yes, technically, you will not store all of the data, but this is the only exception).
Read the descriptions of the reports below first. Think about how you will need to access the data to create your report. It may influence how you store the data.
The population data is from 1960–2002, generally. However, the world changes, data are sometimes unavailable. Thus, the population statistic is not available for every country for every year. Be careful about your assumptions when building the data structures.
If you get really stuck on creating a designing or filling in a good data structure, email me! The objective is not to dwell on the design forever, but to work on the code.

Part II: Reports

Now that you have data, we can analyze it a bit and produce reports based on user input.

Note: Pick just one of the following reports! They are listed below in roughly increasing order of difficulty.

Report 1: World Population for a Year

Ask the user for a year, the calculate the total world population for that year. Sample output:

Calculate world population for what year? 1970
World population in 1970 was 3665297114 (202 countries).

Calculate world population for what year?

Report 2: Population Growth for a Country

Ask the user for a country code, start year, and end year. Calculate and display the population growth in that country from the start year to end year, both as an absolute amount and percentage growth; display the short or long name for the country, not the code. Sample output:

Calculate population growth for what country code? JPN
Starting in what year? 1970
Ending in what year? 2000
From 1970-2000, Japan grew by 22525000 (21.6%).

Calculate population growth for what country code?

Report 3: Top Population Growth

Ask the user for a start year and end year. Calculate and display the top N (5–10 is best) countries based on growth (either absolute growth or percentage growth, your pick). Display short or long names for countries, not codes. Sample output:

Starting in what year? 1970
Ending in what year? 2000
From 1970-2000, the following countries grew the most:
  468354000  India
  444330000  China
   88359301  Indonesia
   78183086  Brazil
   77473000  Pakistan
   77120000  United States
   71588515  Bangladesh
   68374892  Nigeria
   47370000  Mexico
   41122025  Philippines

Starting in what year?

Debugging Tips

There are a couple of good tools for debugging Perl code that creates and especially accesses complex data structures.

ref()

There is a simple built-in function that tells you what kind of value a reference refers to. Look in Perldoc for the ref() function. But here is a simple example:

my $number = 42;
my $num_ref = \$number;
print ref($num_ref) . "\n";

This prints SCALAR. Read the Perldoc for more output values, including what happens if the argument is NOT a reference.

Anyway, the ref() function is a great way to debug expressions that access complex data structures. Remember: If you are stuck, challenge your assumptions! This function can help with that.

Data::Dumper

The second tool is actually an entire built-in module for printing out a formatted representation of a complete data structure. It is the Data::Dumper module, and you can find it in Perldoc, too. Again, a simple example:

use Data::Dumper;
my @structure = ([1, 2, 3], {'a' => 42, 'b' => 'foo', 'c' => [4, 3, 2, 1]});
print Dumper(\@structure);

Notice that I passed a reference to my data structure to Dumper(); it works better this way. The code above prints:

$VAR1 = [
         [
           1,
           2,
           3
         ],
         {
           'c' => [
                    4,
                    3,
                    2,
                    1
                  ],
           'a' => 42,
           'b' => 'foo'
         }
       ];

This output shows that my argument was a reference to a list, the first element of the list was a reference to another list (1, 2, 3), and the second element of the list was a reference to a hash, etc. Again, useful for debugging and challenging your assumptions about what has been built up.

Testing

Now that you know how to debug a script and write automated tests, how will that affect your approach? Will you use the debugger to hunt for problems? Will you write testable subroutines and a few unit tests?

Here are some general ideas for testing:

How do you know if your data structure(s) and data are correct? Think of some ways to verify it.
Is your report accurate? How do you know? Can you think of another way to calculate a result?
What does your script do with bad input? Missing data? Irregularities in the data?

Extra Challenges

[Medium] Support more than one report. Present a menu of report options for the user, each time through the main loop. How do you organize your code effectively?
[Hard] Search the Internet for a technique that displays large integers with commas: e.g., 123456789 → 123,456,789. Integrate the technique into your report(s). Be sure to give credit to the original author!
[Very Hard] Notice that “region” field of the first data file? Rewrite any above to calculate and display statistics for regions instead. For Report #1, report on just one region. For Report #1 or #2, there are no codes for regions (in our data file), so present the user with a list of regions to select from. Does this change in the reporting also change how you arrange your data structures?

Reminders

Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

A printout of your code, ideally on a single sheet of paper. Be sure to put your own name in the initial comment block. Identifying your work is important, or you may not receive appropriate credit.