Prev: W12, Next: W14

Zoom: Link, TopHat: Link (936525), GoogleForm: Link, Piazza: Link, Feedback: Link, GitHub: Link, Sec1&2: Link


Slide:

# Unsupervised Learning

📗 If the groups are discrete: clustering
📗 If the groups are continuous (lower dimensional representation): dimensionality reduction
➩ The output of unsupervised learning can be used as input for supervised learning too (discrete groups as categorical features and continuous groups as continuous features).

Item Input (Features) -
1 \(\left(x_{11}, x_{12}, ..., x_{1m}\right)\) no label
2 \(\left(x_{21}, x_{22}, ..., x_{2m}\right)\) -
3 \(\left(x_{31}, x_{32}, ..., x_{3m}\right)\) -
... ... ... ...
n \(\left(x_{n1}, x_{n2}, ..., x_{nm}\right)\) similar \(x\) in the same or similar groups


US States Economic Data Example
➩ US economics data can be found on Link.
➩ Map data can be found on Link.
➩ Use the features "real per capita personal income", "real per capita personal consumption expenditures", and "regional price parities".
➩ Note: see pivot for the correct way of working with panel data: Doc.

# Hierarchical Clustering

📗 Hierarchical clustering starts with \(n\) clusters and iteratively merge the closest clusters: Link.
➩ It is also called agglomerative clustering, and can be performed using sklearn.cluster.AgglomerativeClustering: Doc.
➩ Different ways of defining the distance between two clusters are called different linkages: scipy.cluster.hierarchy.linkage: Doc.

# Distance Measure

📗 The distance between points can be measured by norms, the distance between items \(x_{1} = \left(x_{11}, x_{12}, ..., x_{1m}\right)\) and \(x_{2} = \left(x_{21}, x_{22}, ..., x_{2m}\right)\) can be:
(1) Manhattan distance (metric = "manhattan"): \(\left| x_{11} - x_{21} \right| + \left| x_{12} - x_{22} \right| + ... + \left| x_{1m} - x_{2m} \right|\), Link,
(2) Euclidean distance (metric = "euclidean"): \(\sqrt{\left(x_{11} - x_{21}\right)^{2} + \left(x_{12} - x_{22}\right)^{2} + ... + \left(x_{1m} - x_{2m}\right)^{2}}\),
(3) Cosine similarity distance (metric = "cosine"): \(1 - \dfrac{x^\top_{1} x_{2}}{\sqrt{x^\top_{1} x_{1}} \sqrt{x^\top_{2} x_{2}}}\).
...

# Average Linkage Distance

📗 If average linkage distance (linkage = "average") is used, then the distance between two clusters is defined as the average distance between every pair of points one from each cluster.
➩ This requires recomputing the centers and their pairwise distances in every iteration and can be very slow.

# Single and Complete Linkage Distance

📗 If single linkage distance (linkage = "single") is used, then the distance between two clusters is defined as the smallest distance between any pairs of points one from each cluster.
📗 If complete linkage distance (linkage = "complete") is used, then the distance between two clusters is defined as the largest distance between any pairs of points one from each cluster.
➩ With single or complete linkage distances, pairwise distances between points only have to be computed once at the beginning, so clustering is typically faster.

# Single vs Complete Linkage

📗 Since single linkage distance finds the nearest neighbors, it is more likely to have clusters that look like chains in which pairs of points are close to each other.
📗 Since complete linkage distance finds the farthest neighbors, it is more likely to have clusters that look like blobs (for example circles) in which all points are closest to a center.
➩ The choice usually depends on the application. 

Comparison Example
➩ Compare single and complete linkage clustering on the circles, moons datasets.

# Number of Clusters

📗 The number of clusters are usually chosen based on application requirements, since there is no optimal number of clusters.
➩ If the number of clusters is not specified, the algorithm can output a clustering tree, called dendrogram.
scipy.cluster.hierarchy.dendrogram: Doc.

# K Means Clustering

📗 Another clustering method is K means cluster: Link.
(0) Start with \(K\) random centers (also called centroids) \(\mu_{1}, \mu_{2}, ..., \mu_{K}\).
(1) Assign step: find points (items) that are the closest to each center \(k\), label these points as \(k\).
(2) Center step: update center \(\mu_{k}\) to be the center of the points labeled \(k\).
(3) Repeat until cluster centers do not change.

# Total Distortion

📗 The objective of K means clustering is minimizing the total distortion, also called inertia, the sum of distances (usually squared Euclidean distances) from the points to their centers, or \(\displaystyle\sum_{i=1}^{n} \left\|x_{i} - \mu_{k\left(x_{i}\right)}\right\|^{2}\) = \(\displaystyle\sum_{i=1}^{n} \displaystyle\sum_{j=1}^{m} \left(x_{ij} - \mu_{k\left(x_{i}\right)j}^{2}\right)\), where \(k\left(x_{i}\right)\) is the cluster index of the cluster closest to \(x_{i}\), or \(k\left(x_{i}\right) = \mathop{\mathrm{argmin}}_{k} \left\|x_{i} - \mu_{k}\right\|\).
➩ K means initialized at a random clustering and each assign-center step is a gradient descent step for minimizing total distortion by choosing the cluster centers.

# Number of Clusters

📗 The number of clusters are usually chosen based on application requirements, since there is no optimal number of clusters.
➩ If the number of cluster is \(n\) (each point is in a different cluster), then the total distortion is 0. This means minimizing the total distortion is not a good way to select the number of clusters.
➩ Elbow method is sometimes use to determine the number of clusters based on the total distortion, but it is a not a clearly defined algorithm: Link.

Economic Data Example Again
➩ Apply 5-means clustering on the economic data for the US states.
➩ Compare K-means with different values of K: the "elbow method" seems to suggest around 4 to 6 clusters.


# Questions?



📗 Notes and code adapted from the course taught by Professors Gurmail Singh, Yiyin Shen, Tyler Caraza-Harter.
📗 If there is an issue with TopHat during the lectures, please submit your answers on paper (include your Wisc ID and answers) or this Google form Link at the end of the lecture.
📗 Anonymous feedback can be submitted to: Form. Non-anonymous feedback and questions can be posted on Piazza: Link

Prev: W12, Next: W14





Last Updated: April 22, 2026 at 2:16 AM