University of Wisconsin - Madison | CS 540 Lecture Notes | C. R. Dyer |
| n | ----- | \ | 1, if / w x >= t | ----- i i a = | i=1 | | 0, otherwise |
1 a = ---------------- n ----- _\ / w x ----- i i i=1 1 + e
So, from here on learning the weights in a neural net will mean learning the weights and the threshold values.
Note: Each pass through all of the training examples is called one epoch
where
xi is the input associated with the ith input unit. alpha is a constant between 0 and 1 called the learning rate.
Notes about this update formula:
The result of executing the learning algorithm for 3 epochs:
x1 | x2 | T | O | delta_w1 | w1 | delta_w2 | w2 | delta_w3 | w3 (=t) |
---|---|---|---|---|---|---|---|---|---|
- | - | - | - | - | .1 | - | .5 | - | .8 |
0 | 0 | 0 | 0 | 0 | .1 | 0 | .5 | 0 | .8 |
0 | 1 | 1 | 0 | 0 | .1 | .2 | .7 | -.2 | .6 |
1 | 0 | 1 | 0 | .2 | .3 | 0 | .7 | -.2 | .4 |
1 | 1 | 1 | 1 | 0 | .3 | 0 | .7 | 0 | .4 |
0 | 0 | 0 | 0 | 0 | .3 | 0 | .7 | 0 | .4 |
0 | 1 | 1 | 1 | 0 | .3 | 0 | .7 | 0 | .4 |
1 | 0 | 1 | 0 | .2 | .5 | 0 | .7 | -.2 | .2 |
1 | 1 | 1 | 1 | 0 | .5 | 0 | .7 | 0 | .2 |
0 | 0 | 0 | 0 | 0 | .5 | 0 | .7 | 0 | .2 |
0 | 1 | 1 | 1 | 0 | .5 | 0 | .7 | 0 | .2 |
1 | 0 | 1 | 1 | 0 | .5 | 0 | .7 | 0 | .2 |
1 | 1 | 1 | 1 | 0 | .5 | 0 | .7 | 0 | .2 |
So, the final learned network is:
More specifically, consider the case where there is a network containing one hidden layer. Output units are specified using index i, hidden units using j, and input units using k. y_i is the desired (teacher) output and a_i is the actual output of the output unit i. Then the backprop algorithm can be given by:
1. in_j = sumk(w_k,j * a_k) // Forward pass starts. Compute weighed input to all hidden units 2. a_j = sigmoid(in_j) // Compute outputs at all hidden units 3. in_i = sumj(w_j,i * a_j) // Compute weighed inputs to all output units 4. a_i = sigmoid(in_i) // Compute outputs at all output units 5. del_i = a_i * (1 - a_i) * (y_i - a_i) // Backward pass starts. Compute "modified error" at output units 6. del_j = a_j * (1 - a_j) * sumi(w_j,i * del_i) // Compute "modified error" at all hidden units 7. w_j,i = w_j,i + (alpha * a_j * del_i) // update weights between hidden and output units 8. w_k,j = w_k,j + (alpha * a_k * del_j) // update weights between input and hidden units
where Ei is the mean squared error (MSE) of the network on the ith training example, computed as follows:
where Ti is the target value for the ith example, and Oi is the network output value for the ith example, and there are n output units in the network.
Now consider the n+1 dimensional space where n dimensions are the weights, and the last dimension is the error, E. The error is non-negative and defines a surface in this weight space as shown below:
So, the goal is to search for the point in weight space with (global) minimum mean squared error E
and then change the ith weight by
delta_wji = alpha * aj * (Ti - Oi) * g'(in_i) = alpha * aj * (Ti - Oi) * Oi * (1 - Oi)
where weight wji connects hidden unit j to output unit i, alpha is the learning rate parameter, Ti is the teacher output associated with output unit i, Oi is the actual output of output unit i, aj is the output of hidden unit j, and g' is the derivative of the sigmoid activation function, which is known to be g' = g(1-g).
delta_wkj = alpha * Ik * aj * (1 - aj) * SUM [ ai(1 - ai)(Ti - ai)wji]
To solve both of these problems, ALVINN takes each input image and computes other views of the road by performing various perspective transformations (shift, rotate, and fill in missing pixels) so as to simulate what the vehicle would be seeing if its position and orientation on the road was not correct. For each of these synthesized views of the road, a "correct" steering direction is approximated. The real and the synthesized images are then used for training the network.
To avoid overfitting using just the most recent images captured, ALVINN maintains a buffer pool of 200 images (both real and synthetic). When a new image is obtained, it replaces one of the images in the buffer pool so that the average steering direction of all 200 examples is straight ahead. In this way, the buffer pool always keeps some images in many different steering directions.
Initially, a human driver controls the vehicle for about 5 minutes while the network learns weights starting from initial random weights. After that one epoch of training using the 200 examples in the buffer pool is performed approximately every 2 seconds.
Copyright © 1996-2003 by Charles R. Dyer. All rights reserved.