CS 202 Fall 2011: Assignment 6

Homework Assignment #6 : Due Friday 10/28 before 5pm

The purpose of this assignment is to become more comfortable with Lists. You'll explore how to make your own Wordle using someone else's software, create your own preliminary Wordle application in Scratch, and step through the binary search algorithm.

Part A: Create a Wordle (2 points)

Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text.

The following is a word cloud I created using the text for Homework Assignment 5.

From the picture, you can easily see which words are repeated the most frequently in the homework specification; some of the words are interesting, like "scratch", "random", "song" and "music", while others are not so interesting, like "use" and "different".

In this part of the homework, you will find some text that means something to you or you that you would like to analyze further. For example, good choices might be text corresponding to favorite song lyrics, a short story, or an article of interest. Copy and paste that text into the textbox here and click the "Go" button. Feel free to modify the colors, the font, and the shape of the cloud to make a picture you find pleasing.

You should be able to save your wordle by selecting to Print it (and then saving it as PDF); otherwise, you'll need to take a screenshot.

When you turn in your wordle for Part A, you must state where you found the text you used. Briefly discuss the high frequency words and why they are or are not a good summary of the ideas behind the text.

Part B: Create a Wordle-Generating Scratch Program (6 points + 2 extra credit)

To obtain experience with the algorithms required to create a word cloud, you will implement a very rudimentary version in Scratch. Your version does not need to display words in a particularly pleasing manner (getting that layout correct is tricky); your version does need to select the words that appear most frequently and display those words such that their size is proportional to the frequency at which they appear in the text.

How will your Scratch program be able to examine text? The way to do this is to have a List (e.g., named TextList). If you right-click on this list when it is showing on the stage, you will be given the option to read in (or import) a list from a text file on your computer. This text file should be plain ASCII text (e.g., not a .doc or .pdf file); every word should be on its own line. If your text file has this format, Scratch will place each word as its own element in TextList. It will now be very easy for you to analyze this text! Just make sure your program doesn't delete this list or modify it in strange ways.

You must follow this specification:

  1. Top W words: You must display in your wordle the W most frequent words in TextList. The value of W is determined by asking the user. The user can only specify values between 3 and 8 (inclusive); if they specify more or less, the program should complain. The less popular words (below W) should not be displayed.
  2. Sizes: The sizes of the W words should be proportional to their frequency in TextList. That is, if the word "Scratch" is the most popular word followed by "Music" and "Game", then your word cloud should show Scratch as the largest, then Music, and then Game. If two words are equally popular, then they should be of the same size.
  3. Colors: Each of the words should be displayed in a different color. Random colors are fine.
  4. Positioning: The W words may appear anywhere on the Stage, but the words should not overlap (or at least not overlap much). If your "cloud" appears to be more of a list, that is fine; you don't need to do any fancy layout of short versus words to arrange them.
You do not need to support different fonts or stamping words vertically.

This task might seem intimidating at first, but you can break the work into small interconnected steps, each of which is quite manageable. The key to creating a wordle is to create a histogram that counts the number of times each word appears in the text. In Scratch, you will represent your histogram as two lists: a Unique list (containing each unique word) and a corresponding Tally list (containing the count of the number of times each of the unique words appears in the list). After you have this histogram, it is relatively straight-forward to find the W most popular words and stamp them on the Stage.

To divide the work into manageable pieces, we strongly recommend that you write the following scripts:

  1. Find Unique: This script takes as input just the Text List. For output, it adds items to a new list Unique such that each word in TextList appears exactly once in UniqueList. For example, if the TextList contains "bike", "book", "bike", "bike"; then the UniqueList should just contain "bike" and "book". The order of items in the UniqueList list does not matter. Hint: You are likely to want to check if the UniqueList already contains a particular item before adding that item again.
  2. Tally Words: This script takes as input the Text and Unique Lists. It should add counts to a new list Tallies to show how many times each unique word appears in the Text List; the order of the elements in Tallies must match the order of elements in the UniqueList. For example, if the TextList contains "doll", "bike", "book", "doll" and Unique contains "doll", "bike", "book" then Tallies must contain "2", "1", "1" (because the word "dolls" appears twice, and bike and book appear once).

    Hint: You are likely to want to implement a nested loop in this script. Specifically, you will probably have an outer loop that examines each element of the Uniquelist; within each iteration of the outer loop you will enter another (inner) loop that examines each element of the TextList; this inner loop will count how many words it finds in the TextList that match the particular word in the UniqueList. You will need separate index variables for each of the two loops as well as a variable to record the current tally for the current word.

  3. Find Most Popular: This script takes as input the Tally and Unique lists as well as input from the user: the number of words W that should be displayed in the word cloud. For output, it places the W most popular words in a new list TopWords along with their corresponding tallies in a new list TopTallies. Those two lists should each have exactly W elements.

    Hint: To find the W most popular words, it is easiest to first find the most popular word in the Tally list, remove it from that list, and add it to the TopTallies list (along with removing and adding the corresponding items from the Unique and TopWords lists). You'll then repeat these steps W times to find the top W most frequent words.

  4. Display Words: This script will display on the Stage the W most popular words (which you put in the TopWords list). To show a single word, you'll want to borrow the scripts shown in this program stamp.sb. You will need to change the scripts. You'll need to modify the script so that you can control the starting (x,y) coordinate of each word, the color of the word, and definitely the size of each word (and correspondingly, how the x variable is changed between letters of the word).

As always, programming assignments and projects in this class should be done on your own. You may ask other students in the class questions, but you may not share code with anyone in the class. You may not use existing code that you find elsewhere, including the Scratch website. You may look at the behavior of existing Scratch projects for inspiration, but you should develop all of your code as a completely new project and not modify, re-mix, or build from any one else's code.

The Instructor and the TA are very happy to give you suggestions on how to implement your ideas. We won't necessarily give the answer, but we will try to guide you to a reasonable implementation. If you have bugs in your code (i.e., it isn't behaving like you expect), we are happy to take a look and see if we can see the problem. But, again, don't wait until the last minute to do your project if you are hoping for any advice!

Extra Credit (Optional)

For extra credit, you may share your wordle program with the class. You can obtain up to two points of extra credit: one point for participating in voting one point for getting a significant number of votes.

To correctly share your picture, follow these steps carefully:

  1. Save your complete project.
  2. "Share" this project on the Scratch website by clicking "Share" and then selecting "Upload project to website". (Of course, you must be connected to the network to do this!)
  3. Use a web browser to go to the Scratch website and visit the gallery UPDATED LATER. Click on the button on the right-side of the page saying "add my projects". In the pop-up box, select your project that you want to add and click "Accept".
  4. Verify that your project with the picture you like is showing up the gallery.
  5. You will then have to vote on a favorite project in your gallery; more details later!

More details will be given later about the voting process.

Part C: Binary Search (2 points)

The following script implements a binary search; it is similar (but not identical) to the code that was shown in class. It has access to several variables and one list named Valuable Numbers. The function round x will round the number x to the nearest integer; numbers such as 6.5 are rounded up to 7.0.

  1. Imagine the List Valuable Numbers contains the following N=16 elements:
    Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
    Contents 6 10 28 51 88 105 113 166 200 210 211 239 280 316 364 373

    Consider running the script searching for the following values of Key.

    1. Key = 51
    2. Key = 364
    3. Key = 166
    4. Key = 20
    For each of the values of Key create a table like the one below showing the values of the designated variables at the end of each iteraction of the repeat until loop.
    Loop # index item greater? lo hi Key Index
    1            
    2            
    3            
    4            
    5            
  2. Imagine that a binary search is run over a list containing a number of elements, N. For each value of N, what is the maximum value that the variable "Guesses" could be incremented to? In other words, what is the maximum number of times the "repeat until" loop could repeat?
    1. N = 16
    2. N = 64
    3. N = 512
    4. N = 4096
    5. N = 16384

Turning in your Homework

You should turn in both parts of this assignment through your Learn@UW account. To do this, we think you can follow these steps:
  1. Login into LearnUW : "learnuw.wisc.edu" using UW NetID and password.
  2. Click on the link "compsci202:Introduction to Computation" under student tab.
  3. Click the Dropbox option which is on Top-Left of your web page.
  4. Click the link to the corresponding HomeWork you need to upload. It directs you to a page where you can upload files.
  5. Upload the desired files and submit them. Your Scratch program will be saved in a file with the extension ".sb". For example, if you named your program "homework6", then you will see a file with the name "homework6.sb" that you should upload. Your write-up for Part A and part C should be in a file with an extension of .doc, .docx, or .pdf.
If you have any questions about how to do this, please don't hesitate to ask. We don't want you to get stuck on steps like handing in your homework.

Menu

Fall 2010
Time: MWF 9:55-10:45
Room: 1221 CS
Lab: 1370 CS (1st floor)


Instructor:
Prof Andrea Arpaci-Dusseau
Office Hours
Mon 11-12, Wed 11-12
Office:
7375 Computer Sciences
Email: dusseau "at" cs.wisc.edu

  • CS202 Home
  • TAs and Lab Hours
  • Lecture Schedule w/ Slides
  • Grading
  • Homeworks
  • Projects
  • Exams
  • Scratch Examples
  • Readings
  • Computing Resources
  • Outreach Opportunity
  • Interesting Links
  • Scratch
  • UW Computer Sciences Dept