Due Monday, July 16, at the start of class.
Read two input data files containing data about the countries of the world and store the data into useful data structures. Then, ask for user input and print a simple report based on the stored data.
Your script should be divided into two main parts:
There are two separate data files containing country data. The first input file (213 lines, 13 KB) contains basic information about each country; here are the first few lines of the file:
ABW : Aruba : Aruba : Latin America & Caribbean ADO : Andorra : Principality of Andorra : Europe & Central Asia AFG : Afghanistan : Islamic State of Afghanistan : South Asia AGO : Angola : People's Republic of Angola : Sub-Saharan Africa ALB : Albania : Republic of Albania : Europe & Central Asia
That is, there are 4 data fields, separated by the string “ : ” (space - colon - space):
The second input file (9960 lines, 206 KB) contains country population data, one line per country per year; here are the first few lines of the file:
ABW : 1960 : 49205 ABW : 1961 : 50244 ABW : 1962 : 51258 ABW : 1963 : 52224 ABW : 1964 : 53117
There are 3 data fields, separated by the string “ : ” (space - colon - space):
Read both data files and store the data in a single, complex, well-considered data structure. Your script must read and store all of the data, even if you do not need it all for Part II. Some hints:
split
function:
my @line_elements = split(' : ', $input_line);
You have probably done so already, but you can read about split() in Perldoc.
Now that you have data, we can analyze it a bit and produce reports based on user input.
Note: Pick just one of the following reports! They are listed below in roughly increasing order of difficulty.
Ask the user for a year, the calculate the total world population for that year. Sample output:
Calculate world population for what year? 1970 World population in 1970 was 3665297114 (202 countries). Calculate world population for what year?
Ask the user for a country code, start year, and end year. Calculate and display the population growth in that country from the start year to end year, both as an absolute amount and percentage growth; display the short or long name for the country, not the code. Sample output:
Calculate population growth for what country code? JPN Starting in what year? 1970 Ending in what year? 2000 From 1970-2000, Japan grew by 22525000 (21.6%). Calculate population growth for what country code?
Ask the user for a start year and end year. Calculate and display the top N (5–10 is best) countries based on growth (either absolute growth or percentage growth, your pick). Display short or long names for countries, not codes. Sample output:
Starting in what year? 1970 Ending in what year? 2000 From 1970-2000, the following countries grew the most: 468354000 India 444330000 China 88359301 Indonesia 78183086 Brazil 77473000 Pakistan 77120000 United States 71588515 Bangladesh 68374892 Nigeria 47370000 Mexico 41122025 Philippines Starting in what year?
There are a couple of good tools for debugging Perl code that creates and especially accesses complex data structures.
There is a simple built-in function that tells you what kind of value a reference refers to. Look in Perldoc for the ref() function. But here is a simple example:
my $number = 42; my $num_ref = \$number; print ref($num_ref) . "\n";
This prints SCALAR
. Read the Perldoc for more output values, including what happens if the argument is
NOT a reference.
Anyway, the ref()
function is a great way to debug expressions that access complex data structures.
Remember: If you are stuck, challenge your assumptions! This function can help with that.
The second tool is actually an entire built-in module for printing out a formatted representation of a complete data
structure. It is the Data::Dumper
module, and you can find it in Perldoc, too. Again, a simple
example:
use Data::Dumper; my @structure = ([1, 2, 3], {'a' => 42, 'b' => 'foo', 'c' => [4, 3, 2, 1]}); print Dumper(\@structure);
Notice that I passed a reference to my data structure to Dumper()
; it works better
this way. The code above prints:
$VAR1 = [ [ 1, 2, 3 ], { 'c' => [ 4, 3, 2, 1 ], 'a' => 42, 'b' => 'foo' } ];
This output shows that my argument was a reference to a list, the first element of the list was a reference to another list (1, 2, 3), and the second element of the list was a reference to a hash, etc. Again, useful for debugging and challenging your assumptions about what has been built up.
Now that you know how to debug a script and write automated tests, how will that affect your approach? Will you use the debugger to hunt for problems? Will you write testable subroutines and a few unit tests?
Here are some general ideas for testing:
Do the work yourself, consulting reasonable reference materials as needed. Any resource that provides a complete solution or offers significant material assistance toward a solution not OK to use. Asking the instructor for help is OK, asking other students for help is not. All standard UW policies concerning student conduct (esp. UWS 14) and information technology apply to this course and assignment.
A printout of your code, ideally on a single sheet of paper. Be sure to put your own name in the initial comment block. Identifying your work is important, or you may not receive appropriate credit.