CS 368-4 (2012 Fall) — Day 5 Homework

Due Thursday, November 8, at the start of class.

Goal

Write a Python script to read two data files about the world’s countries, store the data into useful data structures; then, based on user input, calculate basic world population statistics.

This is a slightly longer, more complex assignment than previous ones. If you are new to programming, you may copy the code that implements Part I (described below), to help you get started. Study it to see how the data structures work.

But, try to do both parts yourself. Be brave! The more you do on your own, the more you will learn!!!

For conceptual ease, I have broken down the overall script into two parts: reading the data files into a single, rich data structure, and then handling user input and the corresponding calculations. In both cases, you should write functions to organize the code and avoid repetition.

Part I: Reading and Storing Data

There are two separate data files containing country data. The first file is called input-05-country.txt, and looks like this:

```ABW : Aruba : Aruba : Latin America & Caribbean
ADO : Andorra : Principality of Andorra : Europe & Central Asia
AFG : Afghanistan : Islamic State of Afghanistan : South Asia
AGO : Angola : People's Republic of Angola : Sub-Saharan Africa
ALB : Albania : Republic of Albania : Europe & Central Asia```

That is, there are 4 data fields, separated by the string “ : ” (space - colon - space):

• Country code
• Short country name
• Long country name
• Geographic region

The second data file is called input-05-population.txt, and looks like this:

```ABW : 1960 : 49205
ABW : 1961 : 50244
ABW : 1962 : 51258
ABW : 1963 : 52224
ABW : 1964 : 53117```

There are 3 data fields, separated by the string “ : ” (space - colon - space):

• Country code
• Year
• Population

Read both data files and store all of the data into a single data structure. If you look at the slides from today, you will see a code sample and diagram that (mostly) correspond to the homework.

Some suggestions:

• You can “split” a line of data into its component elements using the `split` function:
`data_line.strip().split(' : ')`

It is the opposite of the `join` function: It takes a single string, and splits the string into separate list elements by removing any instances of the argument. It returns a list of elements found. As usual, you can look up this function (str.split) via interactive Python or online.

• This is real data, so it is a bit messy. There are “extra” countries in the population file. Only store the data for the countries that occur in both files.
• The population data is from 1960–2002, generally. However, the world changes, data are sometimes unavailable. Thus, the population statistic is not available for every country for every year. Be careful about your assumptions when building the data structures.

Phase II: Report

Now that you have data, it is possible to produce a simple report given some user input. To help in this part, you must write three functions (see below).

The objective is to ask the user for a country code and a year, then show the population for the corresponding country in that year, along with its percentage of the global population in the same year. Here is a sample dialog between the user and the script (as usual, user input is highlighted in yellow):

```Code: USA
Year: 1970
The population of United States in 1970 was 205052000, 5.594% of world population.

Code: IND
Year: 1982
The population of India in 1982 was 718425590, 15.696% of world population.

Code: CHN
Year: 2000
The population of China in 2000 was 1262645000, 20.826% of world population.

Code: ```

When the user just presses Enter for the country code (i.e., enters the empty string), quit running the script.

Functions

For Part II, you must write three helper functions. Each one should be fairly short and easy, and together they make writing the main part of the code much easier. The functions are:

`short_name(code)`
Takes a country code as an argument and returns the corresponding short name of the country. If the country code does not exist in the data, return `None`.
`population(code, year)`
Takes a country code and year as arguments, and returns the population of the given country for the given year. If the country code does not exist in the data, or the country does not contain population data for the given year, return `0` (the integer zero).
`world_population(year)`
Takes a year as an argument, and returns the population of the whole world for the given year (i.e., the sum of populations for all countries with population data for the year). If there is no population data for the given year, return `0` (the integer zero). Hint: This function can use the `population()` function you just wrote!

Test your functions to make sure they work! Now, you can write the “main” code that gets input, calculates the answers, and prints the (one line) report.

Testing

Here are some specific tests to consider:

• Do you store all of the necessary data correctly? How do you know?
• Do your calculations produce the correct statistic(s)? How do you know?
• Does your script handle the case in which a file is missing or unreadable?
• Does your script handle invalid user input? What if a country code is bad? What if a year is not an integer or not in the data?

Extra Challenges

• [Medium?] Design and implement more reports, of your own design. Each time through the main loop, offer a (text) menu of report options for the user. How do you organize your code effectively?
• [Hard] Implement the data structure and related functions as a class. What goes into the class, and what is part of the main code? What are the data attributes of the class? What are the methods?
• [Very Hard] Notice that “region” field of the first data file? Rewrite the report to ask for a region (instead of a country code), then calculate and display statistics for that region. There are no codes for regions (in our data file), so present the user with a list of regions to select from. If you do this option, you may want to consider a different organization of the data structure.

Reminders

Start your script the right way! Here is a suggestion:

```#!/usr/bin/env python

"""Homework for CS 368-4 (2012 Fall)
Assigned on Day 05, 2012-11-05
All homework assignments must be turned in by email! Attach your Python script to the email as a text `.py` file. See the email page for more details about formatting, etc.