Exercise 5: Helping A Friend

This week, you'll explore some basics of lists while implementing a few functions to help out a friend:

  • stats.py - implement some functions to complete a statistics program. This part may be completed in pairs if you wish.

Program Background

Your friend is a statistics major and needs your help with some data analysis. They have two sets of numbers, and need to perform a Student's t-test using these numbers.

They've already written some code, which they've placed in a file called t_test.py. Here's the great thing about functions - you don't need to know how this program works, just what the functions are that you need to write.

If that's a bit more code than you want to wade through all at once, you can use LiClipse features to make it a bit more manageable. If you click this little button with the minus sign next to a function:

...it'll collapse the function into just its header:

If you want to see what's in that function at a later time, you can click the plus sign to expand the function again, but if you collapse both of your friend's functions, the code will look much simpler. Hooray for modularization!

Program Requirements

Create a project in LiClipse and copy your friend's code (above) into a file called t_test.py. In that same project, create a separate file called stats.py - this is where your code will go.

In high level terms, your stats.py file will:

  1. Implement four functions:
    1. mean(L) — given a list L, return the average (mean) of the numbers in L. If L is empty, return None.
    2. std(L) — given a list L, return the standard deviation of the numbers in L:

      If L is empty or contains only one element, return None (to avoid dividing by zero).
    3. remove_outliers(L, sd) — given a list L, remove any numbers which are more than sd standard deviations away from the mean of the list and return the updated list. For this function, you may assume that sd is an int and is greater than 0.
    4. t_test(L1, L2) — given two lists L1 and L2, return the t-test statistic for the two lists:

Some more specific requirements:

  1. All code in your stats.py file should be contained in functions - there should be no code outside of a function definition in your file. (Comments are okay, but no code.)
  2. When you run the program, you should run either t_test.py from above, or the tester.py file below. Both of these will use your functions.
  3. You may not use external Python modules like NumPy to complete these methods (we'll get there, I promise! Just not yet). You may use the math module.

How do we approach this?

You’re free to implement this program as you wish, given that your functions meet the specification. If you need some suggestions, take a look at the below steps for some ideas on how to approach this. After each step, test your code and make sure it is working as you intend.

  1. Set up your project with t_test.py and your stats.py files. Add function definitions to the stats.py file to get rid of the unresolved import errors in t_test.py. Just have all functions return None except remove_outliers, which can return L (the list) for now.
  2. Start by implementing mean(L) and std(L) - try using the built-in function sum() if you need to sum the values in a list. (For testing purposes, a list containing the numbers [2,4,4,4,5,5,7,9] has mean 5 and standard deviation 2.1380899353.)
    • Why is t_test.py printing the mean and SD as ints? If you look at print_stats(), you'll notice that the display is using the % operator to do string formatting. If you print the string "%d" followed by a % operator and a number, Python will insert that number into the string as an integer!

      >>> print "blah blah %d" % 47.2
      blah blah 47
  3. Next, implement the t_test(L1,L2) function. This is still just a mathematical function, even if the math is a little more complicated. Once you've finished this step, your friend's t_test.py program should run without any errors!
  4. Finally, you'll need to implement remove_outliers(L,sd). You'll need to know the mean and standard deviation of the list, and calculate the minimum and maximum allowable values - plus or minus sd standard deviations from the mean. Anything in the list that is within that range should be appended to the new list you'll be returning.

Testing your code

This time, we'll actually show you some of the testing scripts that will be used to grade your program - when you're writing real code, you're frequently working with the testers to get feedback on how your program works, so this is a little more realistic.

Here's the testing program - like with t_test.py, just add it to the same project as stats.py and run it. Don't worry about understanding everything that's going on, just pay attention to the output.

IMPORTANT: Test your code yourself as you go. The testing program may help on an almost-finished program, but it won't help as you're developing!

Commenting your code

Once again, some of your program points will come from commenting your code.

  • Add a 1-2 sentence description of the program and its function to the top of your code.
  • Include comments within your code describing what you're doing. You don't need to comment on every line of code, but be clear in your explanations.

Submitting your files

As usual, you'll be handing in your lab work via the course Learn@UW dropboxes. Navigate to our 301 course page, and click the Dropbox link in the top navigation bar. You should see a dropbox for Program 5 - this is where you should hand in your stats.py file. You will ONLY submit stats.py.

If you worked in a pair, only one person will need to hand in the code.

Note that the dropbox will close at noon on 3 March, so be sure to submit your files before then.