Computer Sciences Department logo

CS 368-4 (2011 Fall) — Day 3 Homework

Due Thursday, November 3, at the start of class.

Goal

Write a Python script that gathers input data and calculates statistics.

Tasks

Use the Python that you have learned to write a simple data analysis tool.

The script will ask the user for some data observations, each of which consists of one text label and its associated numeric value. For example, I entered U.S. states (labels) and their 2010 populations (values). After some input, the script might have the following data (populations are in thousands):

LabelValue
Wisconsin5687
Illinois12831
Michigan9884
Iowa3046

The script should cycle through the following steps:

  1. Display a list of the current labels and values
  2. Display some summary statistics about the data
  3. Ask the user for another item and then the corresponding value
  4. Store the label and value

A typical interaction might look like the following. The exact formatting is not required. In this sample interaction, the yellow highlighting shows what the user typed.

OBSERVATIONS
<none>

Label? Wisconsin
Value? 5687

----------------------------------------

OBSERVATIONS
Wisconsin: 5687.00

STATISTICS
Count:       1
Sum:      5687.0
Mean:     5687.0

Label? Illinois
Value? 12831

----------------------------------------

OBSERVATIONS
Illinois: 12831.00
Wisconsin: 5687.00

STATISTICS
Count:       2
Sum:     18518.0
Mean:     9259.0

Label?

How the user will quit the script? Maybe if the label input is the empty string? Maybe a special label or value that tells the script to quit? Pick something and implement it.

The program should not attempt to save its state. That is, when you quit the program and run it again later, it will start with no observations.

Tip: If the user enters the same item a second time, it should replace the original item; that is, you do not need to check to see if an item already exists, just store the data.

Tip: If the user tries to enter the same item more than once, but spells it differently, it will end up being a separate item. That is, your script does not need to be clever or fancy about item names — just accept what the user types.

Extra Challenges

If the requirements above were easy, try one or more of the following challenges. No extra credit, just extra learning!

  1. Calculate and display additional statistics: minimum value, maximum value, standard deviation from the mean, ….
  2. Add the ability to delete items from the observations. How does the user indicate that an item should be deleted? Maybe just a blank value? What if that item is not in the data already?
  3. [Hard:] Display a simple text histogram for your data. It is easiest to use a horizontal bar chart, perhaps something like this:
    RANGE    FREQUENCY
    -------  ---------+---------+---------+
     0M- 5M  XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     5M-10M  XXXXXXXXXXXXXXX
    10M-15M  XXX
    15M-20M  XX
    20M-25M  
    25M-30M  X
    30M-35M  
    35M-40M  X
  4. [Hard:] How could you display the observations in the order that the user entered them (assuming that it does not do so already)?

Reminders

Start your script the right way! Here is a suggestion:

#!/usr/bin/env python

"""Homework for CS 368-4 (2011 Fall)
Assigned on Day 03, 2011-11-01
Written by <Your Name>
"""

Do the work yourself, consulting reasonable reference materials as needed; any reference material that gives you a complete or nearly complete solution to this problem or a similar one is not OK to use. Asking the instructors for help is OK, asking other students for help is not.

Hand In

A printout of your code on a single sheet of paper (if at all possible). Be sure to put your own name in the initial comment block of the code. Identifying your work is important, or you may not receive appropriate credit.