Prev: M1 Next: M3

# M2 Past Exam Problems

📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit enter key)
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.

# Warning: please enter your ID before you start!


# Question 1


📗  

# Question 2


📗  

# Question 3


📗  

# Question 4


📗  

# Question 5


📗  

# Question 6


📗  

# Question 7


📗  

# Question 8


📗  

# Question 9


📗  

# Question 10


📗  

# Question 11


📗  

# Question 12


📗  

# Question 13


📗  

# Question 14


📗  

# Question 15


📗  

# Question 16


📗  

# Question 17


📗  

# Question 18


📗  

# Question 19


📗  

# Question 20


📗  

# Question 21


📗  

# Question 22


📗  

# Question 23


📗  

# Question 24


📗  

# Question 25


📗  


📗 [3 points] Suppose the cumulative distribution function (CDF) of a discrete random variable \(X \in \left\{0, 1, 2, ...\right\}\) is given in the following table. What is the probability that is observed.
\(\mathbb{P}\left\{X < 0\right\}\) \(\mathbb{P}\left\{X \leq 0\right\}\) \(\mathbb{P}\left\{X \leq 1\right\}\) \(\mathbb{P}\left\{X \leq 2\right\}\) \(\mathbb{P}\left\{X \leq 3\right\}\) \(\mathbb{P}\left\{X \leq 4\right\}\)
\(0\)

📗 Answer: .
📗 [4 points] Given the vocabulary \(a, b\) and the following probabilities, compute the bigram (Markov) transition matrix, row (column) 1 corresponding to \(a\) and row (column) 2 corresponding to \(b\). Note: in the table \(\mathbb{P}\left\{a b\right\}\) means the probability of \(b\) after \(a\).
\(\mathbb{P}\left\{a b\right\}\) \(\mathbb{P}\left\{a a\right\}\) \(\mathbb{P}\left\{b a\right\}\) \(\mathbb{P}\left\{b b\right\}\)

📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 [3 points] Given the following bigram (Markov) transition matrix , the rows (columns) representing the word tokens . What is the probability, given we start with , we get , where the sequence is repeated times, and there are a total of words including the initial word.
📗 Answer: .
📗 [3 points] Welcome to the Terrible-Three-Day-Tour! We will visit New York on Day 1. The rules for Day 2 and Day 3 are:
(a) If we were at New York the day before, with probability we will stay in New York, and with probability we will go to Baltimore.
(b) If we were at Baltimore the day before, with probability we will stay in Baltimore, and with probability we will go to Washington D.C.
On average, before you start the tour, what is your chance to visit (at least on one of the two days)?
📗 Answer: .
📗 [2 points] An n-gram language model computes the probability \(\mathbb{P}\left\{w_{n} | w_{1}, w_{2}, ..., w_{n-1}\right\}\). How many parameters need to be estimated for a -gram language model given a vocabulary size of ?
📗 Answer: .
📗 [3 points] Suppose the vocabulary is the alphabet plus space (26 letters + 1 space character), what is the (maximum likelihood) estimated trigram probability \(\hat{\mathbb{P}}\left\{a | x, y\right\}\) with Laplace smoothing (add-1 smoothing) if the sequence \(x, y\) never appeared in the training set. The training set has tokens in total. Enter -1 if more information is required to estimate this probability.
📗 Answer: .
📗 [2 points] Given the training data "", with the gram model, what is the probability of observing the new sentence "" given the first word is ? Use MLE (Maximum Likelihood Estimate) without smoothing and do not include the probability of observing the first word.
📗 Answer: .
📗 [4 points] Suppose the vocabulary is "", "", "", and the training data is "". Write down the transition matrix. Make sure that the sum of the transition probabilities in each row is 1.
📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 [4 points] Given the following transition matrix for a bigram model with words "Eat", "My" and "Hammer": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?
📗 Answer: .
📗 [4 points] Given the following transition matrix for a bigram model with words "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?
📗 Answer: .
📗 [4 points] Given the following transition matrix for a bigram model with words "", "" and "": . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). What is the probability that the third word is "" given the first word is ""?
📗 Answer: .
📗 [2 points] You have a vocabulary with \(n\) = word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{\text{dune}}\) = . Using Laplace smoothing (add ), compute \(p_{\text{dune}}\).
📗 Answer: .
📗 [2 points] You have a vocabulary with word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{\text{tenet}}\) = . Using add-one smoothing \(\delta\) = (Laplace smoothing), compute \(p_{\text{tenet}}\).
📗 Answer: .
📗 [2 points] You have a vocabulary with word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{"zoodles"}\) = . Using add-one smoothing \(\delta\) = (Laplace smoothing), compute \(p_{"zoodles"}\).
📗 Answer: .
📗 [2 points] You have a vocabulary with \(n\) = word types. You want to estimate the unigram probability \(p_{w}\) for each word type \(w\) in the vocabulary. In your corpus the total word token count \(\displaystyle\sum_{w} c_{w}\) is , and \(c_{\text{tenet}}\) = . Using add-one smoothing \(\delta\) = (Laplace smoothing), compute \(p_{\text{tenet}}\).
📗 Answer: .
📗 [3 points] Given an infinite state sequence where the pattern "" is repeated infinite number of times. What is the (maximum likelihood) estimated transition probability from state to (without smoothing)?
📗 Answer: .
📗 [4 points] Given the following transition matrix for a bigram model with words "Eat" (label 0), "My" (label 1) and "Hammer" (label 2): . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). Two uniform random numbers between 0 and 1 are generated to simulate the words after "Eat", say \(u_{1}\) = and \(u_{2}\) = . Using the CDF (Cumulativ Distribution Function) inversion method (inverse transform method), which two words are generated? Enter two integer labels (0, 1, or 2), not strings.
📗 Answer (comma separated vector): .
📗 [4 points] Given the following transition matrix for a bigram model with words "I" (label 0), "am" (label 1) and "Groot" (label 2): . Row \(i\) column \(j\) is \(\mathbb{P}\left\{w_{t} = j | w_{t-1} = i\right\}\). Two uniform random numbers between 0 and 1 are generated to simulate the words after "I", say \(u_{1}\) = and \(u_{2}\) = . Using the CDF (Cumulativ Distribution Function) inversion method (inverse transform method), which two words are generated? Enter two integer labels (0, 1, or 2), not strings.
📗 Answer (comma separated vector): .
📗 [2 points] In a corpus with word tokens, the phrase "Fort Night" appeared times (not Fortnite). In particular, "Fort" appeared times and "Night" appeared . If we estimate probability by frequency (the maximum likelihood estimate) without smoothing, what is the estimated probability of P(Night | Fort)?
📗 Answer: .
📗 [2 points] In a corpus with word tokens, the phrase "Home Lander" appeared times (not Homelander). In particular, "Home" appeared times and "Lander" appeared . If we estimate probability by frequency (the maximum likelihood estimate) without smoothing, what is the estimated probability of \(\mathbb{P}\){Lander | Home}?
📗 Answer: .
📗 [2 points] In a corpus with word tokens, the phrase "San Francisco" appeared times. In particular, "San" appeared times and "Francisco" appeared . If we estimate probability by frequency (the maximum likelihood estimate), what is the estimated probability of P(Francisco | San)?
📗 Answer: .
📗 [2 points] In a corpus (set of documents) with word types (unique word tokens), the phrase "" appeared times. In particular, "" appeared times and "" appeared . If we estimate probability by frequency (the maximum likelihood estimate) with Laplace smoothing (add-1 smoothing), what is the estimated probability of \(\mathbb{P}\){ | }?
📗 Answer: .
📗 [2 points] In a corpus with word tokens, the phrase "San Francisco" appeared times. In particular, "San" appeared times and "Francisco" appeared . If we estimate probability by frequency (the maximum likelihood estimate), what is the estimated probability of P(Francisco | San)?
📗 Answer: .
📗 [2 points] According to Zipf's law, if a word \(w_{1}\) has rank and \(w_{2}\) has rank , what is the ratio \(\dfrac{f_{1}}{f_{2}}\) between the frequency (or count) of the two words?
📗 Answer: .
📗 [1 points] Blank.
📗 Answer: .

# Grade


 * * * *

 * * * * *


📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 2" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.


📗 You can find videos going through the questions on Link.





Last Updated: January 20, 2025 at 3:12 AM