Young Wu's Homepage

Prev: L2, Next: L4

Zoom: Link, Piazza: Link, Google Form: Link.

Wisc ID for in-class quiz: (if your wisc email is "test@wisc.edu", please enter "test")
Token: (will be given during the lectures)

Slide:

# Multi-Layer Perceptron

📗 A single perceptron (with possibly non-linear activation function) still produces a linear decision boundary (the two classes are separated by a line).

📗 Multiple perceptrons can be combined in a way that the output of one perceptron is the input of another perceptron:

➩ \(a^{\left(1\right)} = g\left(w^{\left(1\right)} x + b^{\left(1\right)}\right)\),

➩ \(a^{\left(2\right)} = g\left(w^{\left(2\right)} a^{\left(1\right)} + b^{\left(2\right)}\right)\),

➩ \(a^{\left(3\right)} = g\left(w^{\left(3\right)} a^{\left(2\right)} + b^{\left(3\right)}\right)\),

➩ \(\hat{y} = 1\) if \(a^{\left(3\right)} \geq 0\).

# Neural Networks

📗 Multi-Layer Perceptrons are also called Artificial Neural Networks (NNs): Link, Wikipedia.

➩ Human brain: 100,000,000,000 neurons, each neuron receives input from 1,000 other neurons.

➩ An impulse can either increases or decrease the probability of nerve pulse firing (activation of neuron).

📗 Universal Approximation Theorem: Wikipedia.

➩ A 2-layer network (1 hidden layer) can approximate any continuous function arbitrarily closely with enough hidden units.

➩ A 3-layer network (2 hidden layers) can approximate any function arbitrarily closely with enough hidden units.

In-class Discussion

📗 Try different combinations of activation function and network architecture (number of layers and units in each layer), and compare which ones are good for the spiral dataset: Link.

# Fully Connected Network

📗 The perceptrons will be organized in layers: \(l = 1, 2, ..., L\) and \(m^{\left(l\right)}\) units will be used in layer \(l\).

➩ \(w_{j k}^{\left(l\right)}\) is the weight from unit \(j\) in layer \(l - 1\) to unit \(k\) in layer \(l\), and in the output layer, there is only one unit, so the weights are \(w_{j}^{\left(L\right)}\).

➩ \(b_{j}^{\left(l\right)}\) is the bias for unit \(j\) in layer \(l\), and in the output layer, there is only one unit, so the bias is \(b^{\left(L\right)}\).

➩ \(a_{i j}^{\left(l\right)}\) is the activation for training item \(i\) unit \(j\) in layer \(l\), where \(a_{i j}^{\left(0\right)} = x_{i j}\) can be viewed as unit \(j\) in layer \(0\) (alternatively, \(a_{i j}^{\left(l\right)}\) can be viewed as internal features), and \(a_{i}^{\left(L\right)}\) is the output representing the predicted probability that \(x_{i}\) belongs to class \(1\) or \(\mathbb{P}\left\{\hat{y}_{i} = 1\right\}\).

📗 The way the hidden (internal) units are connected is called the architecture of the network.

📗 In a fully connected network, all units in layer \(l\) are connected to every unit in layer \(l - 1\).

➩ \(a_{i j}^{\left(l\right)} = g\left(a_{i 1}^{\left(l - 1\right)} w_{1 j}^{\left(l\right)} + a_{i 2}^{\left(l - 1\right)} w_{2 j}^{\left(l\right)} + ... + a_{i m^{\left(l - 1\right)}}^{\left(l - 1\right)} w_{m^{\left(l - 1\right)} j}^{\left(l\right)} + b_{j}^{\left(l\right)}\right)\).

Example

📗 [1 points] The following is a diagram of a neural network: highlight an edge (mouse or touch drag from one node to another node) to see the name of the weight (highlight the same edge to hide the name). Highlight color: .

Name of input units: 4
Name of hidden layer 1 units: 3
Name of hidden layer 2 units: 2
Name of hidden layer 3 units: 0
Name of output units: 1

In-class Quiz

ID:

📗 [4 points] Given the following neural network that classifies all the training instances correctly. What are the labels (0 or 1) of the training data? The activation functions are LTU for all units: \(1_{\left\{z \geq 0\right\}}\). The first layer weight matrix is , with bias vector , and the second layer weight vector is , with bias

\(x_{i1}\)	\(x_{i2}\)	\(y_{i}\) or \(a^{\left(2\right)}_{1}\)
0	0	?
0	1	?
1	0	?
1	1	?

Note: if the weights are not shown clearly, you could move the nodes around with mouse or touch.

📗 Answer (comma separated vector): .

[Note] Use the space to explain the steps or just take notes:

[Q1] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

# Questions?

📗 If you have questions, please use (i) Zoom chat, (ii) Piazza: Link, (iii) Office hours and discussion sessions. Please do NOT use Canvas mail and use email only to the course instructor (not TAs) for grading issues.

Additional In-class Discussion

📗 Sometimes a question not in the notes will be asked during the lecture, you can submit your answer here:

Notes (not visible to other students):
[Q2] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

Submit your answer to see other students answers (click the submit button to refresh):

Additional In-class Quiz

📗 Sometimes a question not in the notes will be asked during the lecture, you can submit your answer here:

A.
B.
C.
D.
E.
Notes (not visible to other students):
[Q3] Please check the box to confirm submission (submissions for questions not discussed during the lectures will result in in-class quiz point deduction).

Submit your answer to see other students answers (click the submit button to refresh):

# In-class Quiz Instructions

📗 To get full points on the in-class quizzes for a lecture:

➩ Submit relevant answers to the questions discussed during the lecture: incorrect answers are okay.

➩ Some questions require [notes] to earn the point.

➩ Some questions require special ID (given during the lecture) to earn the point.

➩ Do not submit answers to questions that are not discussed during the lectures. Each such submission will result in a deduction of one point.

➩ Submissions after the lecture, before the midterm (first 14 lectures) and the final exam (last 14 lectures), are accepted. After the exams, no in-class quiz submissions will be accepted.

➩ The grade on Canvas Assignment Q3 is computed as number of points divided by the number of questions asked (out of 1) and updated on Canvas every weekend.

📗 If there are any issues with submission on the website, please use this Google form: Link.

📗 Bonus point opportunities during a few lectures (added to in-class quiz above 20 points).

📗 Notes and code adapted from the course taught by Professors Jerry Zhu, Blerina Gkotse, Yudong Chen, Yingyu Liang, Charles Dyer. Some content are generated using Copilot .

Prev: L2, Next: L4

Last Updated: July 16, 2026 at 12:17 PM