Computer Sciences Department logo

CS 368-3 (2012 Summer) — Day 8 Homework

Due Monday, July 16, at the start of class.

Goal

Read two input data files containing data about the countries of the world and store the data into useful data structures. Then, ask for user input and print a simple report based on the stored data.

Tasks

Your script should be divided into two main parts:

  1. Reading and storing the country data
  2. Reporting on that data

Part I: Reading and Storing Data

There are two separate data files containing country data. The first input file (213 lines, 13 KB) contains basic information about each country; here are the first few lines of the file:

ABW : Aruba : Aruba : Latin America & Caribbean
ADO : Andorra : Principality of Andorra : Europe & Central Asia
AFG : Afghanistan : Islamic State of Afghanistan : South Asia
AGO : Angola : People's Republic of Angola : Sub-Saharan Africa
ALB : Albania : Republic of Albania : Europe & Central Asia

That is, there are 4 data fields, separated by the string “ : ” (space - colon - space):

The second input file (9960 lines, 206 KB) contains country population data, one line per country per year; here are the first few lines of the file:

ABW : 1960 : 49205
ABW : 1961 : 50244
ABW : 1962 : 51258
ABW : 1963 : 52224
ABW : 1964 : 53117

There are 3 data fields, separated by the string “ : ” (space - colon - space):

Read both data files and store the data in a single, complex, well-considered data structure. Your script must read and store all of the data, even if you do not need it all for Part II. Some hints:

Part II: Reports

Now that you have data, we can analyze it a bit and produce reports based on user input.

Note: Pick just one of the following reports! They are listed below in roughly increasing order of difficulty.

Report 1: World Population for a Year

Ask the user for a year, the calculate the total world population for that year. Sample output:

Calculate world population for what year? 1970
World population in 1970 was 3665297114 (202 countries).

Calculate world population for what year?

Report 2: Population Growth for a Country

Ask the user for a country code, start year, and end year. Calculate and display the population growth in that country from the start year to end year, both as an absolute amount and percentage growth; display the short or long name for the country, not the code. Sample output:

Calculate population growth for what country code? JPN
Starting in what year? 1970
Ending in what year? 2000
From 1970-2000, Japan grew by 22525000 (21.6%).

Calculate population growth for what country code?

Report 3: Top Population Growth

Ask the user for a start year and end year. Calculate and display the top N (5–10 is best) countries based on growth (either absolute growth or percentage growth, your pick). Display short or long names for countries, not codes. Sample output:

Starting in what year? 1970
Ending in what year? 2000
From 1970-2000, the following countries grew the most:
  468354000  India
  444330000  China
   88359301  Indonesia
   78183086  Brazil
   77473000  Pakistan
   77120000  United States
   71588515  Bangladesh
   68374892  Nigeria
   47370000  Mexico
   41122025  Philippines

Starting in what year?

Debugging Tips

There are a couple of good tools for debugging Perl code that creates and especially accesses complex data structures.

ref()

There is a simple built-in function that tells you what kind of value a reference refers to. Look in Perldoc for the ref() function. But here is a simple example:

my $number = 42;
my $num_ref = \$number;
print ref($num_ref) . "\n";

This prints SCALAR. Read the Perldoc for more output values, including what happens if the argument is NOT a reference.

Anyway, the ref() function is a great way to debug expressions that access complex data structures. Remember: If you are stuck, challenge your assumptions! This function can help with that.

Data::Dumper

The second tool is actually an entire built-in module for printing out a formatted representation of a complete data structure. It is the Data::Dumper module, and you can find it in Perldoc, too. Again, a simple example:

use Data::Dumper;
my @structure = ([1, 2, 3], {'a' => 42, 'b' => 'foo', 'c' => [4, 3, 2, 1]});
print Dumper(\@structure);

Notice that I passed a reference to my data structure to Dumper(); it works better this way. The code above prints:

$VAR1 = [
         [
           1,
           2,
           3
         ],
         {
           'c' => [
                    4,
                    3,
                    2,
                    1
                  ],
           'a' => 42,
           'b' => 'foo'
         }
       ];

This output shows that my argument was a reference to a list, the first element of the list was a reference to another list (1, 2, 3), and the second element of the list was a reference to a hash, etc. Again, useful for debugging and challenging your assumptions about what has been built up.

Testing

Now that you know how to debug a script and write automated tests, how will that affect your approach? Will you use the debugger to hunt for problems? Will you write testable subroutines and a few unit tests?

Here are some general ideas for testing:

Extra Challenges

Reminders

Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.

Hand In

A printout of your code, ideally on a single sheet of paper. Be sure to put your own name in the initial comment block. Identifying your work is important, or you may not receive appropriate credit.