Lab 4: Dictionaries and Try/Except

Dictionaries

4.0 Creating Dictionaries

Dictionaries in python consist of two lists. The first is a list of keys. The second, a list of values. The lists are linked so that I can ask the dictionary for the value associated with a given key. This "key-value pair" makes up an entry in the dictionary. Much like in a physical dictionary, a python dictionary is optimized to make the keys easy to find, and to make each value trivial to find once you have the right key.

To create a dictionary, we use curly braces {}. (This is in contrast to lists, which use square brackets []. There is another type called a "tuple" which uses parentheses (), but we will focus on lists and dictionaries in this class.) Inside the braces, we need to define our key-value pairs.

{ 'furColor' : 'blue' ,
'favoriteFood' : 'cookies' ,
'name' : 'Cookie Monster' ,
'home' : 'Sesame Street' }

I don't need to place the key-value pairs in any particular order. Python dictionaries are inherently unsorted, ignoring the order in which I enter elements. A colon separates each key (left) from its corresponding value (right), and a comma separates the pair from the next pair. Note that I have added line-breaks for readability. These linebreaks are optional, and python ignores the indentation of any lines after the first. If I print out an existing dictionary, this (minus the line breaks) is the format that I will see.

{ 2:5 , 2:7 }

If I assign the same key in a dictionary multiple values, then that key will have a corresponding value equal to the last value I assigned to it.

4.1 Getting and Setting Values in Dictionaries

For this section, we will use the following dictionary of Harry Potter characters which maps their first names to their last names:

hp_chars = {
'Harry':'Potter' ,
'Hermione':'Granger' ,
'Fred':'Weasley' ,
'George':'Weasley' ,
'Ginny':'Weasley' ,
'Albus':'Dumbledore' }

To get an existing value, given its key, we call DICT[KEY]. So for example we might write hp_chars['Hermione'], which evaluates to 'Granger'. If we try to get the value associated with a key that hasn't been assigned yet, then we get a Keyerror. For example, we might write hp_chars['Padma'], which gives such an error.

To add a new key-value pair to the dictionary, we call DICT[KEY] = VALUE. So we can write hp_chars['Padma'] = 'Patil'. If we then attempted to read out the value at key 'Padma' again, we would no longer get a Keyerror and would instead get the newly added value.

To remove a key-value pair from the dictionary, we use the del statement: del DICT[KEY]. If I had mistakenly added 'Mrs':'Norris' to the hp_chars dictionary, I could use the following statement to remove the entry:

del hp_chars['Mrs']

Notice that the del statement takes the dictionary indexed at the key of the key-value pair to be removed. It also changes the dictionary that you call it on, rather than returning a modified version of the dictionary. Just as with lists, if we want to avoid making changes to the current copy, we can create a new copy with:

new_dict = dict(old_dict)

Tasks:

  • If you do not yet have your own copy of hp_chars, create one now. Add several characters to the dictionary. (If you are not familiar with Harry Potter, either make a dictionary of characters from a source you are familiar with, add random character names, or consult the person sitting next to you for ideas.) Remove at least one character of your choice from the dictionary.
  • Write a short function that, given the name of a dictionary and two additional inputs old and new, alters the dictionary so that the key-value pair with key old now has key new instead. It is ok if your function throws an error when given an old that is not a key of the dictionary.

4.2 ML Reader Version 2

Recall your ML Reader Version 1 from last week. I asked you to write a program that can read in data from files structured like titanic_fatalities.data. To help you test your program, I have created a second file in the same format: dice_game.data. In version 1, you were to store the data in lists. Now, store the data using dictionaries. You have some design choices to make here, so think about how you might want to use the data. Below, I've supplied one possible way you could use your data in the future:

def filterPassengersOnly():
  passengerData = {}
  for example in data:
    if data[example]['type'] == 'passenger':
      passengerData[example] = dict(data[example])
  return passengerData

When you finish writing and testing your reader, please show it to me.

4.3 Useful Dictionary Functions

In the example above, I looped over the keys in a dictionary using for KEY in DICT:. That is not the only way to iterate over a dictionary. Here are three functions that give iterators, along with simple examples of their use.

  • DICT.keys() returns a list of the keys in the dictionary.
  • DICT.values() returns a list of the values in the dictionary. If multiple keys store the same value, then the values list will include multiple copies of that value.
  • DICT.items() returns a list of the key-value pairs in the dictionary. The pairs that it returns are of type "tuple", but the elements in a pair can be accessed with the same syntax as you would use for a list. You can also convert a tuple to a list using list() (and you will want to if you plan to make any changes to the pair).

print 'Fred' in hp_chars.keys() #Should print True, unless you removed him earlier.

weasleyCounter = 0
for i in hp_chars.values():
  if i == 'Weasley':
  weasleyCounter += 1

#The following program has some unintended consequences.
#See if you can spot a potential problem. Is there a way
#to resolve the issue you found?
alphabetized = {}
for p in hp_chars.items():
  if p[0] > p[1]:
    alphabetized[p[1]] = p[0]
  else:
    alphabetized[p[0]] = p[1]

Tasks:

  • Write a script to find the smallest set of feature values that is not represented in the titanic_fatalities data. (For example, the pairing "luxury" + "child" is represented by entry trainEx112.)
  • If you are interested in a challenge, take a look at this more complex data file: heart_train.arff You will notice that the format is different from the .data file we worked with. You might want to create a separate reader function to handle this type of file. Now adapt your script from the previous task to find the smallest unrepresented set of features in this dataset.

Try / Except

4.4 Reading an Error Message

We have all seen our code throw exceptions and crash. An error is not generally something we want to see, but the error messages are actually very good for us. Consider the following error message:

line 22, in 
    print pd['trainEx2']

KeyError: 'trainEx2'

Without seeing any of the code that generated this error, I can conclude that there is almost certainly an issue at line 22 in the default module "<module>" (usually an indication that the error is in the root file of the program rather than imported). Further, I can tell you that the error was a KeyError, suggesting that I likely tried to access a dictionary entry that does not exist. The report even tells me that 'trainEx2' was the missing key, and that my dictionary is called 'pd'.

You do not need to memorize any part of an error readout. They are designed to read naturally and easily. With this in mind, when you write your own error reports, you should make sure that they also read naturally and easily.

4.5 A Philosophy of Error Handling

The following are some best practices for writing error handling code.

  • Errors should be visible. Nothing is harder to debug than code that fails silently, somewhere. It may be tempting to catch and suppress all errors your code makes... Resist this temptation.
  • It is fine to raise your own exceptions, even if you do not handle them. Sometimes you want your program to crash.
  • Python keeps a hierarchy of exception types. (You can find it here.) When you catch and handle errors, try to catch only the narrowest type that encompasses what you handle.
  • Often, you can write working code using either exception handling or an if-else statement. Unless the if-else statement is significantly more complicated, you should use that option.

4.6 Try and Except Syntax

In python, our basic error handling tools are the try and except statements. A try statement precedes a block of code that you consider "risky", meaning it might throw an exception.

try:
  uint = int(raw_input("Please enter an integer: "))

Since we cannot guarantee that the user will actually give us an integer, or even a number, we place this statement inside a try block. The try block does not do any error handling on its own. It just defines the scope of the following error handling mechanism, the except statement. Except declares that, in the preceding try block, a particular kind of exception is expected. If that exception occurs, the program immediately jumps to the except statement's body instead of completing the try block or crashing.

try:
  uint = int(raw_input("Please enter an integer: "))
  print 24 * uint
except ValueError:
  print "Boo. That wasn't an integer."

Tasks:

  • Write a short program that gets two integers from the user and prints the result when the first is divided by the second. Your program should be stable, and should not contain any if statements. (The latter requirement is for practice. In general, I would prefer to use an if statement here.)
  • Write a short program that gets a filename from the user and prints the first line of the given file. Your program should be stable. (Hint: What do you want to do if the file does not exist? What about if you don't have read permissions for the file?)

4.7 Error Handling Details

Each exception type has a set of properties. For example, an IOError has the following properties:

  • errno: There are many different ways I/O can fail. This number indicates which one occurred. errno 13, for example, indicates we tried to access a file when we did not have permission.
  • filename: The name of the file on which I/O failed.
  • strerror: A string describing the problem. For errno 13, this string is 'Permission Denied'.

We can access these properties by giving the exception a variable name when we catch it. (The syntax here may remind you of the with statement we saw briefly last week.

li = ['apple','orange','grape','mango']
try:
  for i in range(5):
    print li[i]
except IndexError as e:
  print e.args #An IndexError has one property, args, which contains the error message.
  print "Offending index:",i #We can still access variables within the try block here.

Task:

  • Select an error type (e.g. IOError). Figure out its useful properties. Write a short program that causes the chosen error to occur, and prints out an explanation of the error. When you have done that, share with the person seated next to you. (You will notice that you do not have enough information to write the complete traceback. When you write your own error messages, you should not mimic the traceback model. You can and should write more descriptive and specific error messages.)

4.7 Causing Exceptions

The raise statement allows us to cause an error to occur. There are two general cases for using raise: First, we may want to handle an error partially, but still crash. Second, we may detect an impending error before python does, and we may not want to wait to halt the program. To raise a SyntaxError, for example, we write the following:

raise SyntaxError('arg1 is a message to display','other args are allowed',3,'etc')

Each error has a slightly different format, depending on the information it needs to be descriptive. Of course, if you raise an exception and then handle it yourself, you can include exactly the arguments that you need to write your error handler. Each error type is a class, so once we've learned about classes you will be able to make your own error types for this purpose. For now, just use an existing error type (but be aware that you might not be the only one raising that exception in your code!) The following code shows an example of how we might "handle" an exception, but still want to re-raise that exception for future handling:

except Exception as e:
  print "We threw this exception:",type(e)
  raise e

Task:

  • The list method index() has a single parameter, a value. It finds the first index in its list which contains the given value, and reports that index. If the value is not present, then the method raises a ValueError. Write a version of index() for dictionaries. Your version should take the dictionary and the value as parameters, and either report a key whose value is the given value or raise a ValueError.

4.8 Following Try/Except

You can attach an else statement to the end of a try/except sequence. The block of code following the else statement is executed if and only if the try block threw no exceptions. This is an excellent way to handle code you only want to run if the setup for that code didn't fail. For example, the following script repeatedly asks for file names, and prints the contents of each file if that file exists:

while True:
  fname = raw_input("Filename: ")
  try:
    f = open(fname)
  except IOError:
    print "Sorry, that file does not exist."
  else:
    for line in f:
      print line
    f.close()

You can use a finally statement for code that you want to run regardless of whether an exception occurred or not. This has some odd interactions with exceptions that you don't handle. Essentially, python guarantees that the code in the finally block is the last code that the try-except-finally sequence will run.

try:
  RISKY CODE
except ERROR1:
  HANDLING CODE
except ERROR2:
  MORE HANDLING CODE
else:
  CODE THAT NEEDS THE
  RISKY CODE TO WORK
finally:
  CODE THAT YOU WANT
  TO RUN REGARDLESS

Task:

  • Write a utility function called safeReadFiles() with one parameter. This function should, given a list of readable filenames, call your ML Reader on each of the files in turn. You should not assume anything about the input to safeReadFiles(). If the function is given bad input, you should report the bad input in some descriptive way. If only part of the input is bad, then you should call your ML Reader on the good parts of the input, while outputting descriptive error messages for the remainder.