Exercise 11: Ciphers Part II

During this lab, you will create and hand in 1 file:

  • decipher.py - write a program to decode a message using letter frequencies. This may be completed in pairs if you wish.

Background

This week, we'll actually do the work of decoding a relatively simple encryption algorithm called a Caesar Cipher. (Actually reading through this wikipedia page will help you understand the assignment!)

Decoding a Caesar Cipher without a key requires that you know the relative frequencies of letters in the alphabet so that you can make the best guess possible. Last week you figured out how frequently letters appear in the English language based on large bodies of English text; this week, you'll be using those frequencies to make an educated guess at decrypting a file.

Because we know the message was encoded using a rotation of the alphabet (e.g. A = B, B = C, etc), we can narrow down the 403291461126605635584000000 possible substitution ciphers to a mere 26 rotations, and just check all of those to see which one fits best.

Program Requirements

Your program will take as input an encoded file, and write the decrypted file.

  1. Prompt the user for letter frequencies. You have two options on how to implement this requirement:
    1. Using your work from lab 10, you may prompt the user for an English text file and calculate the relative frequencies yourself. Your prompt will be Path to a file to parse:
    2. You may prompt for a file containing a set of standard frequencies directly, formatted as described below. Your prompt will be Path to a frequencies file:

    If a file does not exist, display Error: cannot find and the filename, and quit the program.
  2. Prompt the user for an encoded file and determine how far the alphabet was shifted in the cipher.
    1. Note: only letters are encoded. Punctuation, numbers, and whitespace are all left alone.
    2. Read the file in and calculate each encoded letter's relative frequency. If the file does not exist, display Error: cannot find and the filename, and quit the program.
    3. For each of the 26 possible alphabet rotations, multiply your calculated relative frequency value for that character with the expected calculated frequency of the TRANSLATED character.

      For example, my encoded character T has a relative frequency of 0.02, and for this rotation, I'm translating it to an A, which has a relative frequency of 0.08. My result for that character would be 0.02*0.08 = 0.0016.

      To get the score for a single rotation, sum the relative frequency products for each translated letter.
    4. Notice that 0.02*0.08 + 0.08*0.02 = .0032, while 0.08*0.08 + 0.02*0.02 = .0068 - the better fit a rotation is, the higher our total score will be. The best rotation is just the one with the highest score!
    5. Helpful hint again: to shift letters, try combining the ord() function from last time with the chr() function - chr( ord('A') + 1 ) gives B. Remember to wrap around the end, though - shifting Z by 1 should give A, not [.
  3. Write the decoded file.
    1. Using the best rotation, decode the message in the file.
    2. All files should be written to output.txt in the current working directory.

Sample Inputs

If you choose to use text files and calculate your own frequencies, your input files will be the same as last week.

If you choose to use a frequencies list rather than calculating your own, the input file will look like this. (These frequencies may look different from your calculated frequencies from last week, but they have the same relative scale and should give you similar output.)

I've also worked up a few simple coded messages for you:

Sample Output

Using The Federalist Papers by Alexander Hamilton in the books directory:

Path to a file to parse: books/federalist.txt
Path to an encoded message: message1.txt
File output.txt written.

Using a frequencies file in my current working directory:

Path to a frequencies file: freq.txt
Path to an encoded message: message1.txt
File output.txt written.

The output.txt for both of these runs should look the same.

How do we approach this?

The biggest challenge of this program will be translating from one alphabet rotation to another. You'll want to do a lot of math with letters, so familiarize yourself with ord() and its friend, chr().

>>> ord('A')
65
>>> chr( ord('A') + 2 )
'C'

You still have a choice of data structures here (list or dictionary), so the precise combination of these functions will be unique to your implementation. Use lots of print statements, and make sure you're calculating what you mean to be calculating!

Commenting your code

Once again, some of your program points will come from commenting your code.

  • Add a 1-2 sentence description of the program and its function to the top of your code.
  • Include comments within your code describing what you're doing. You don't need to comment on every line of code, but be clear in your explanations.

Submitting your files

As usual, you'll be handing in your lab work via the course Learn@UW dropboxes. Navigate to our 301 course page, and click the Dropbox link in the top navigation bar. You should see a dropbox for Program 11 - this is where you should hand in your decipher.py file.

Note that the dropbox will close at noon on 21 April, so be sure to submit your files before then.