Prev: L43, Next: L45

# Lecture

📗 The lecture is in person, but you can join Zoom: 8:50-9:40 or 11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.
📗 The in-class (participation) quizzes should be submitted on TopHat (Code:741565), but you can submit your answers through Form at the end of the lectures too.
📗 The Python notebooks used during the lectures can also be found on: GitHub. They will be updated weekly.


# Lecture Notes

📗 Stochastic Processes
➭ A stochastic process is a sequence of random variables.
➭ If the sequence is finite or countably infinite, it is called a discrete-time stochastic process, and the index represent the time step, usually, \(0, 1, ...\).
➭ If the sequence is uncountably infinite, it is a continuous-time stochastic process.



📗 Markov Chains
➭ One special class of stochastic processes is called Markov processes, where \(X_{t}\) only depends on \(X_{t-1}\), but not \(X_{t-2}, X_{t-3}, ...\).
➭ Formally, the sequence \(X_{t}\) is a (discrete-time) Markov chain, \(\mathbb{P}\left\{X_{t} = x | X_{1} = x_{1}, X_{2} = x_{2}, ..., X_{t-1} = x_{t-1}\right\}\) = \(\mathbb{P}\left\{X_{t} = x | X_{t-1} = x_{t-1}\right\}\): Link.

📗 Transition Matrices
➭ If the time \(t\) and the state distribution \(X_{t}\) are both discrete, a Markov chain can be represented by a transition matrix.
➭ A transition matrix is a matrix where row \(i\) column \(j\) represents the probability that \(\mathbb{P}\left\{X_{t} = j | X_{t-1} = i\right\}\).

From \ to 1 2 3
1 \(\mathbb{P}\left\{X_{t} = 1 | X_{t-1} = 1\right\}\) \(\mathbb{P}\left\{X_{t} = 2 | X_{t-1} = 1\right\}\) \(\mathbb{P}\left\{X_{t} = 3 | X_{t-1} = 1\right\}\)
2 \(\mathbb{P}\left\{X_{t} = 1 | X_{t-1} = 2\right\}\) \(\mathbb{P}\left\{X_{t} = 2 | X_{t-1} = 2\right\}\) \(\mathbb{P}\left\{X_{t} = 3 | X_{t-1} = 2\right\}\)
3 \(\mathbb{P}\left\{X_{t} = 1 | X_{t-1} = 3\right\}\) \(\mathbb{P}\left\{X_{t} = 2 | X_{t-1} = 3\right\}\) \(\mathbb{P}\left\{X_{t} = 3 | X_{t-1} = 3\right\}\)


➭ The rows of transition matrices should sum up to 1. The columns does not have to sum up to 1.
➭ One simple language model is the bigram model, which estimates and simulates the distribution of the next word given the current word.



📗 Simulation and Estimation
➭ To simulate a Markov chain with transition matrix m starting from state x0 from 0, 1, 2, ...: use x1 = numpy.random.choice(len(m), p = m[x0, :]), and x2 = numpy.random.choice(len(m), p = m[x1, :]), and so on: Doc.
➭ To estimate a Markov chain transition matrix given a sequence, one way is to use the maximum likelihood estimate: \(\hat{\mathbb{P}}\left\{X_{t} = j | X_{t-1} = i\right\} = \dfrac{c_{i j}}{c_{i}}\), where \(c_{i j}\) is the number of times \(i\) is followed by \(j\) in the sequence, and \(c_{i}\) is the number of times \(i\) appears in the sequence.

Markov Chain Example ➭ Code for simulation a Markov chain: Notebook.
➭ Code for simulation a bigram model: Notebook.

TopHat Discussion ➭ Suppose there are 20 candidates competing for one position, and after interviewing each candidate, the employer has to decide whether to hire the candidate and reject the candidate (cannot hire the rejected candidates later).
➭ The employer's strategy is to interview and reject the first n candidates and hire the first candidate that is better than all n candidates (or the last candidate if none are better).
➭ What is the optimal n?
➭ Code to run simulation in parallel: Notebook.
➭ The problem is called the secretary problem or the marriage problem (date n people and marry the first person that is better than all n): Link




📗 Notes and code adapted from the course taught by Yiyin Shen Link and Tyler Caraza-Harter Link






Last Updated: April 29, 2024 at 1:10 AM