Young Wu's Homepage

Prev: L7, Next: L9

Zoom: Link, Piazza: Link, Google Form: Link.

Wisc ID for in-class quiz: (if your wisc email is "test@wisc.edu", please enter "test")
Token: (will be given during the lectures)

Slide:

# Learning Convolution

📗 The features can be engineered using computer vision techniques such as HOG or SIFT.

📗 They can also be learned as hidden units in a neural network. These neural networks are call convolutional neural networks (CNN): Link, Link, Wikipedia.

📗 Instead of activation units \(a = g\left(w^\top x + b\right)\) or \(a = g\left(w \cdot x + b\right)\), the dot product can be replaced by convolution (usually cross-correlation in practice, which is convolution without flipping the filters). The resulting matrix of activation units is called an activation map computed as \(A = g\left(W \star x + b\right)\).

➩ For filters in CNN, zero padding means adding zeros around the image pixel matrices so that the activation maps have the same size as the image; and no padding means not adding zeros around the image so the activation maps will be \(2 k\) pixels smaller than the image in each dimension.

➩ Filters with a stride of \(s\) means the skipping \(s - 1\) pixels when moving the filter around when computing the convolution: a stride of \(1\) is the standard convolution; and a stride of \(2 k + 1\) (filter size) is also called non-overlapping and each pixel is only used once in the computation of the convolution.

# Convolution and Pooling Layers

📗 Convolution can also be applied on activation maps in the previous layer \(A^{\left(l\right)} = g\left(W^{l} \star A^{\left(l - 1\right)} + b\right)\).

📗 Multiple units (in a \(k\) by \(k\) region) can be combined into one in pooling layers.

➩ Max pooling computes the maximum in a square region: \(\left[A^{\left(l\right)}\right]_{r c} = \displaystyle\max_{s = 0, 1, ..., k, t = 0, 1, ..., k} \left\{\left[A^{\left(l-1\right)}\right]_{r k + s, c k + t}\right\}\), where \(k\) is the pooling filter size.

➩ Average pooling computes the average in a square region: \(\left[A^{\left(l\right)}\right]_{r c} = \dfrac{1}{k^{2}} \displaystyle\sum_{s = 0}^{k} \displaystyle\sum_{t = 0}^{k} \left[A^{\left(l-1\right)}\right]_{r k + s, c k + t}\).

➩ The pooling layers usually have no padding and stride \(k\) (non-overlapping).

📗 The filter weights in convolution layers need to be trained using gradient descent. The pooling layer does not have weights that need to be trained.

➩ The gradient with respect to the weights in the convolution layers can be computed using convolution: \(\dfrac{\partial C}{\partial W} = X \star \dfrac{\partial C}{\partial A}\) and \(\dfrac{\partial C}{\partial X} = \text{rot} W \star \dfrac{\partial C}{\partial A}\), where \(\text{rot} W\) is the filter matrix rotated by 180 degrees (for example \(\begin{bmatrix} a & b \\ c & d \end{bmatrix}\) to \(\begin{bmatrix} d & c \\ b & a \end{bmatrix}\)).

➩ The gradient for the pooling layers is (i) for max pooling: \(1\) for the maximum input, 0 for other unit, (ii) for average pooling: \(\dfrac{1}{k^{2}}\) for each of the units in the \(k \times k\) region.

In-class Discussion

📗 [1 points] How many weights need to be trained in the following convolutional neural network? Click on a unit in an activation map to see the filter weights.

Input: 6
Conv layer filter size: 3
Pooling layer filter size: 2
Output layer: 3

[Note] Use the space to explain the steps or just take notes:

[Q1] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

In-class Quiz

ID:

📗 [4 points] A convolutional neural network has input image of size x that is connected to a convolutional layer that uses a x filter, zero padding of the image, and a stride of 1. There are activation maps. (Here, zero-padding implies that these activation maps have the same size as the input images.) The convolutional layer is then connected to a pooling layer that uses x max pooling, a stride of (non-overlapping, no padding) of the convolutional layer. The pooling layer is then fully connected to an output layer that contains output units. There are no hidden layers between the pooling layer and the output layer. How many different weights must be learned in this whole network, not including any bias.

📗 Answer: .

[Note] Use the space to explain the steps or just take notes:

[Q2] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

Other students' answers:

# Examples of Convolutional Neural Networks

📗 LeNet is a simple convolutional neural network: Link, Wikipedia.

📗 AlexNet is one of the earliest deep CNN architecture: Wikipedia.

📗 InceptionNet (GoogLeNet) introduced Inception module and auxiliary classifiers to improve training CNN with large number of layers: Link.

➩ 1 by 1 convolutions are used to reduce the number of activation maps.

➩ auxiliary classifiers are added so that the gradient in earlier layers does not become zero even when many of the weights in later layers are close to 0.

📗 ResNet introduces additional skip layer connections to improve training networking that are very deep: Wikipedia.

📗 U-Net also uses skip layer connections for image segmentation tasks: Wikipedia.

In-class Discussion

📗 [1 points] Find an image that is classified incorrectly by MobileNet V1: Wikipedia.

(Please wait for 5-10 seconds for the results ...)

➩ Top three:

➩ Predictions:

➩ Embeddings:

[Q3] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

Other students' answers:

# Questions?

📗 If you have questions, please use (i) Zoom chat, (ii) Piazza: Link, (iii) Office hours and discussion sessions. Please do NOT use Canvas mail and use email only to the course instructor (not TAs) for grading issues.

Additional In-class Discussion

📗 Sometimes a question not in the notes will be asked during the lecture, you can submit your answer here:

Notes (not visible to other students):
[Q4] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

Submit your answer to see other students answers (click the submit button to refresh):

Additional In-class Quiz

📗 Sometimes a question not in the notes will be asked during the lecture, you can submit your answer here:

A.
B.
C.
D.
E.
Notes (not visible to other students):
[Q5] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

Submit your answer to see other students answers (click the submit button to refresh):

# In-class Quiz Instructions

📗 To get full points on the in-class quizzes for a lecture:

➩ Submit relevant answers to the questions discussed during the lecture: incorrect answers are okay.

➩ Some questions require [notes] to earn the point.

➩ Some questions require special ID (given during the lecture) to earn the point.

➩ Do not submit answers to questions that are not discussed during the lectures. Each such submission will result in a deduction of one point.

➩ Submissions after the lecture, before the midterm (first 14 lectures) and the final exam (last 14 lectures), are accepted. After the exams, no in-class quiz submissions will be accepted.

➩ The grade on Canvas Assignment Q8 is computed as number of points divided by the number of questions asked (out of 1) and updated on Canvas every weekend.

📗 If there are any issues with submission on the website, please use this Google form: Link.

📗 Bonus point opportunities during a few lectures (added to in-class quiz above 20 points).

📗 Notes and code adapted from the course taught by Professors Jerry Zhu, Blerina Gkotse, Yudong Chen, Yingyu Liang, Charles Dyer. Some content are generated using Copilot .

Prev: L7, Next: L9

Last Updated: July 16, 2026 at 12:17 PM