Prev: W8, Next: W10

# Overview

📗 Readings: MARL Chapter 8.
📗 Wikipedia page: Link

# Deep Q Network

📗 Neural networks can be trained on offline data sets using gradient descent to minimize some loss function, \(C\left(Q\left(s, a\right), \hat{Q}\left(s, a; w, b\right)\right)\).
➭ Squared loss: \(C\left(Q, \hat{Q}\right) = \left(Q - \hat{Q}\right)^{2}\).
➭ Cross-entropy loss: \(C\left(Q, \hat{Q}\right) = - Q \log\left(\hat{Q}\right) - \left(1 - Q\right) \log\left(1 - \hat{Q}\right)\).
📗 Gradient descent then can be used to compute the weights iteratively: \(w = w - \lambda \dfrac{\partial C}{\partial w}\) and \(b = b - \lambda \dfrac{\partial C}{\partial b}\) for every weight and bias.
📗 Neural network can be trained using genetic algorithm too.

📗 For Q learning, gradient descent can be combined with Q iteration during online learning: one of the algorithms is called Deep Q Network with experience replay (DQN), where the cost function is given by \(C\left(r_{t} + \gamma \displaystyle\max_{a} \hat{Q}\left(s_{t+1}, a; w, b\right), \hat{Q}\left(s_{t}, a_{t}; w, b\right)\right.\).






Last Updated: May 07, 2024 at 12:22 AM