During this lab, you will create and hand in 1 file:
Over the next couple of weeks, we'll be looking at a relatively simple encryption algorithm called a Caesar Cipher.
Decoding a Caesar Cipher without a key is relatively straightforward, but requires that you know the relative frequencies of letters in the alphabet so that you can make the best guess possible. Next week we'll look at the decoding part; this week we'll just focus on figuring out how frequently letters appear in the English language.
Your program will parse a file to determine the relative frequency of each letter in typical English language usage.
File does not exist:
Path to a file to parse: doesnotexist.txt Error: cannot find doesnotexist.txt
Using The Phoenix and the Carpet by E. Nesbitt in my current working directory:
Path to a file to parse: phoenix.txt A: 0.0823032440936 B: 0.0167026632589 C: 0.0231893836332 D: 0.0487402415011 E: 0.12588051087 F: 0.0182794240056 G: 0.0206812339803 H: 0.0638771446696 I: 0.0632904429964 J: 0.00215246176355 K: 0.0101719402591 L: 0.0410984522076 M: 0.0201972050999 N: 0.068937446601 O: 0.0762565499741 P: 0.0182610895783 Q: 0.000946056448035 R: 0.0561106812706 S: 0.0581128007304 T: 0.0969891203508 U: 0.0286347085376 V: 0.00781413290993 W: 0.0253748473659 X: 0.00269516081126 Y: 0.0229180341094 Z: 0.000385022973037
Using The Federalist Papers by Alexander Hamilton in the books directory:
Path to a file to parse: books/federalist.txt A: 0.0717240387036 B: 0.0165059657185 C: 0.0333123861393 D: 0.0332700685756 E: 0.13360712797 F: 0.0270144747227 G: 0.0142853515637 H: 0.051538560822 I: 0.076517560731 J: 0.00221426652025 K: 0.00218993392112 L: 0.0352981378156 M: 0.0234100762351 N: 0.0730242458481 O: 0.0789381253742 P: 0.0231646343657 Q: 0.00142187014009 R: 0.0590922459412 S: 0.0626151831187 T: 0.107156534784 U: 0.0283146818671 V: 0.0114997979336 W: 0.0151211234467 X: 0.00283421882835 Y: 0.0154279257835 Z: 0.000501463129765
Using Are Women People? by Alice Duer Miller in my current working directory:
Path to a file to parse: ./suffrage.txt A: 0.0755596455078 B: 0.0175245563847 C: 0.028387380719 D: 0.0360493728369 E: 0.125452617681 F: 0.0223258047092 G: 0.0201052273591 H: 0.0488527017024 I: 0.0659371436574 J: 0.00270070218257 K: 0.00798207533959 L: 0.03861003861 M: 0.0277272090744 N: 0.0693380278873 O: 0.0841618820893 P: 0.0211454978294 Q: 0.000840218456799 R: 0.0638966131194 S: 0.0605957548963 T: 0.0924640406506 U: 0.0320083221638 V: 0.00962250185048 W: 0.023446095985 X: 0.00186048372577 Y: 0.0231460179647 Z: 0.000260067617581
Feel free to try other books from Project Gutenberg and see what distributions you get - maybe look at some other languages' character distribution frequencies and compare!
We're back to our fundamental concept: the counting program. The only thing different this time is the source of the stuff you're counting.
There are no requirements for your program's internal structure this week, only its behavior, but keep in mind that we'll be using the results from this week in next week's program (so you may want to create some functions you can reuse).
When we've dealt with counting in the past, we've been keeping track of no more than a few different counts. It made sense to store these in individual variables. Now, we're dealing with 26 counts. We don't want to manually create 26 count variables; while it might work, it's a very messy approach and we can do much better.
The better approach would involve using one of the data structures we've learned about to store these counts. Using a dictionary should be relatively straightforward (think about what the keys and what the values should be), but how could we use a list? We know a list with 26 elements would have index 0 through 25, inclusively. If we read the letter 'c', how do we know what index we need to use to access the list? This is where the above suggestion about ord() comes into play - think about how you could use that to accomplish this.
Another important consideration is counter initialization. Just like we've done in the past, we need to initialize any counter before we actually process them. Consider where it is appropriate to place that initialization code, and what value you should initialize each counter to.
Once again, some of your program points will come from commenting your code.
As usual, you'll be handing in your lab work via the course Learn@UW dropboxes. Navigate to our 301 course page, and click the Dropbox link in the top navigation bar. You should see a dropbox for Program 10 - this is where you should hand in your freq.py file.
Note that the dropbox will close at noon on 14 April, so be sure to submit your files before then.