Young Wu's Homepage

Prev: L9, Next: L11 , Assignment: A7 , Practice Questions: M25 M26 , Links: Canvas, Piazza, Zoom, TopHat (453473)

Tools

📗 Calculator:

📗 Canvas:

📗 You can expand all TopHat Quizzes and Discussions: , and print the notes: , or download all text areas as text file: .

Slide:

# Local Search Algorithms

📗 For some problems, every state is a solution, only some states are better than other states specified by a cost function (sometimes score or reward): Wikipedia.

📗 The search strategy will go from state to state, but the path between states is not important.

📗 Local search assumes similar (nearby) states have similar costs, and search through the state space by iteratively improving the costs to find an optimal state.

📗 The successor states are called neighbors (or move set).

# Hill Climbing or Valley Finding

📗 Hill climbing is the discrete version of gradient descent: Wikipedia.

➩ It starts at a random state.

➩ Move to the best neighbor (successor) state.

➩ Stop when all neighbors are no better than the current state (local minimum).

📗 Random restarts can be used to pick multiple random initial states and find the best local minimum (similar to neural network training).

📗 If there are too many neighbors, first choice hill climbing randomly generates neighbors until a better neighbor is found.

TopHat Discussion

ID:

📗 [1 points] Given the initial state (red points) \(i\), the neighbors are \(i - 1\) and \(i + 1\), and is used for imization of the score, which solution will be found? Click on the red point to restart, and click anywhere else to move to the next iteration.

Temperature: \(T\) = , \(dT\) = .

📗 Answer: .

TopHat Quiz

(Past Exam Question) ID:

📗 [3 points] Given the scores in the following table, if hill-climbing (valley-finding) is used, how many states will lead to the global imum? Note: the neighbors of state \(i\) are states \(i - 1\) and \(i + 1\) (if they exist).

State	0	1	2	3	4	5	6	7
Score

📗 Answer: .

# Simulated Annealing

📗 Simulated annealing uses a process similar to heating solids (heating and slow cooling to toughen and reduce brittleness): Wikipedia.

➩ Each time, a random neighbor is generated.

➩ If the neighbor has a lower cost, move to the neighbor.

➩ If the neighbor has a higher cost, move to the neighbor with a small probability: \(p = e^{- \dfrac{\left| f\left(s'\right) - f\left(s\right) \right|}{T\left(t\right)}}\), where \(f\) is the cost and \(T\left(t\right)\) is the temperature and decreasing in \(t\).

➩ Stop until bored.

📗 Simulated annealing is a version of Metropolis-Hastings algorithm: Wikipedia.

Example

📗 The traveling salesman problem is often solved by simulated annealing: Link.

# Temperature

📗 The temperature function should be decreasing over time. They can change arithmetically or geometrically.

➩ Arithmetic sequence: for example, \(T\left(t + 1\right) = \displaystyle\max\left\{T\left(t\right) - 1, 1\right\}\).

➩ Geometric sequence: for example, \(T\left(t + 1\right) = 0.9 T\left(t\right)\).

📗 When the temperature is high: almost always accept any state.

📗 When the temperature is low: first choice hill climbing.

TopHat Discussion

ID:

Temperature: \(T\) = , \(dT\) = .

📗 Answer: .

# Genetic Algorithm

📗 Genetic algorithm starts with a fixed population of initial states, and the successors are found through cross-over and mutation: Wikipedia.

📗 Each state in the population with \(N\) states has probability of reproduction proportional to the fitness (or negatively proportional to the costs): \(p_{i} = \dfrac{f\left(s_{i}\right)}{f\left(s_{1}\right) + f\left(s_{2}\right) + ... + f\left(s_{N}\right)}\).

📗 If the states are encoded by strings, cross-over means swapping substrings at a fixed point: for example, abcde and ABCDE cross-over at position 2 results in abCDE and ABcde: Wikipedia.

📗 If the states are encoded by strings, mutation means randomly updating substrings with a small probability called the mutation rate: for example, abcde can be updated to abCde or aBcDe or ... with small probabilities: Link

📗 Genetic algorithm: in each generation, the reproduction process is:

➩ Randomly sample two states based on the reproduction probabilities.

➩ Cross-over these two states to produce two children states.

➩ Mutate these two states with small probabilities.

➩ Repeat the process until the same population size is reached, and continue to the next generation.

TopHat Quiz

(Past Exam Question) ID:

📗 [4 points] When using the Genetic Algorithm, suppose the states are \(\begin{bmatrix} x_{1} & x_{2} & ... & x_{T} \end{bmatrix}\) = , , , . Let \(T\) = , the fitness function (not the cost) is \(\mathop{\mathrm{argmax}}_{t \in \left\{0, ..., T\right\}} x_{t} = 1\) with \(x_{0} = 1\) (i.e. the index of the last feature that is 1). What is the reproduction probability of the first state: ?

📗 Answer: .

# Variants of Genetic Algorithm

📗 The parents do not survive in the standard genetic algorithm, but if reproduction between two copies of the same states is allowed, the parents can survive.

📗 The fitness or cost functions can be replaced by the ranking.

➩ If state \(s_{i}\) has the \(k\)-th lowest fitness value among all states, the reproduction probability can be computed by \(p_{i} = \dfrac{k}{1 + 2 + ... + N}\).

📗 In theory, cross-over is much more efficient than mutation.

Example

📗 Many problems can be solved by genetic algorithm (but in practice, reinforcement learning techniques are more efficient and produce better policies).

➩ Walkers: Link.

➩ Cars: Link.

➩ Eaters: Link.

➩ Image: Link.

# State Representation of Neural Networks

📗 A neural network can be represented by a sequence of weights (a single state).

📗 Two neural networks can swap a subset of weights (cross-over).

📗 One neural networks can randomly update a subset of weights with small probability (mutation).

📗 Genetic algorithm can be used to train neural networks to perform reinforcement learning tasks.

Example

📗 Flappy bird: Link, Link.

📗 Cars: Link.

📗 Notes and code adapted from the course taught by Professors Jerry Zhu, Yingyu Liang, and Charles Dyer.

📗 Please use Ctrl+F5 or Shift+F5 or Shift+Command+R or Incognito mode or Private Browsing to refresh the cached JavaScript.

📗 Anonymous feedback can be submitted to: Form.

Prev: L9, Next: L11

Last Updated: April 09, 2025 at 11:29 PM