Prev: L33, Next: L35

# Lecture

📗 The lecture is in person, but you can join Zoom: 8:50-9:40 or 11:00-11:50. Zoom recordings can be viewed on Canvas -> Zoom -> Cloud Recordings. They will be moved to Kaltura over the weekends.
📗 The in-class (participation) quizzes should be submitted on TopHat (Code:741565), but you can submit your answers through Form at the end of the lectures too.
📗 The Python notebooks used during the lectures can also be found on: GitHub. They will be updated weekly.


# Lecture Notes

📗 Least Squares Regression
➩ If the label \(y\) is continuous, it can still be predicted using \(\hat{f}\left(x'\right) = w_{1} x'_{1} + w_{2} x'_{2} + ... + w_{m} x'_{m} + b\).
scipy.linalg.lstsq(x, y) can be used to find the weights \(w\) and the bias \(b\): Doc.
➩ It computes the least-squares solution to \(X w = y\), or the \(w\) such that \(\left\|y - X w\right\|\) = \(\displaystyle\sum_{i=1}^{n} \left(y_{i} - w_{1} x_{i1} - w_{2} x_{i2} - ... - w_{m} x_{im} - b\right)^{2}\) is minimized.
sklearn.linear_model.LinearRegression performs the same linear regression.

Item Input (Features) Output (Labels) -
1 \(\left(x_{11}, x_{12}, ..., x_{1m}\right)\) \(y_{1} \in \mathbb{R}\) training data
2 \(\left(x_{21}, x_{22}, ..., x_{2m}\right)\) \(y_{2} \in \mathbb{R}\) -
3 \(\left(x_{31}, x_{32}, ..., x_{3m}\right)\) \(y_{3} \in \mathbb{R}\) -
... ... ... ...
n \(\left(x_{n1}, x_{n2}, ..., x_{nm}\right)\) \(y_{n} \in \mathbb{R}\) used to figure out \(y \approx \hat{f}\left(x\right)\)
new \(\left(x'_{1}, x'_{2}, ..., x'_{m}\right)\) \(y' \in \mathbb{R}\) guess \(y' =  \hat{f}\left(x\right)\)


 Design Matrix
➩ \(X\) is a matrix with \(n\) rows and \(m + 1\) columns, called the design matrix, where each row of \(X\) is a list of features of a training item plus a \(1\) at the end, meaning row \(i\) of \(X\) is \(\left(x_{i1}, x_{i2}, x_{i3}, ..., x_{im}, 1\right)\).
➩ The transpose of \(X\), denoted by \(X^\top\), flips the matrix over its diagonal, which means each column of \(X^\top\) is a training item with a \(1\) at the bottom.

📗 Matrix Inversion
➩ \(X w = y\) can be solved using \(w = y / X\) (not proper notation) or \(w = X^{-1} y\) only if \(X\) is square and invertible.
➩ \(X\) has \(n\) rows and \(m\) columns so it is usually not square and thus not invertible.
➩ \(X^\top X\) has \(m + 1\) rows and \(m + 1\) columns and is invertible if \(X\) has linearly independent columns (the features are not linearly related).
➩ \(X^\top X w = X^\top y\) is used instead of \(X w = y\), which can be solved as \(w = \left(X^\top X\right)^{-1} \left(X^\top y\right)\).

📗 Matrix Inverses
scipy.linalg.inv(A) can be used to compute the inverse of A: Doc.
scipy.linalg.solve(A, b) can be used to solve for \(w\) in \(A w = b\) and is faster than computing the inverse: Doc.
➩ The reason is that computing the inverse is effectively solving \(A w = e_{1}\), \(A w = e_{2}\), ... \(A w = e_{n}\), where \(e_{j}\) is the vector with \(1\) as position \(j\) and \(0\) everywhere else, for example, \(e_{1} = \begin{bmatrix} 1 \\ 0 \\ 0 \\ ... \end{bmatrix}\), \(e_{2} = \begin{bmatrix} 0 \\ 1 \\ 0 \\ ... \end{bmatrix}\), \(e_{3} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ ... \end{bmatrix}\) ...

Grade Regression Example
➩ Find the linear relationship between exam 1 and exam 2 grades.
➩ Code for simple linear regression: Notebook.
➩ Code for multiple regression: Notebook.

 LU Decomposition
➩ A square matrix \(A\) can be written as \(A = L U\), where \(L\) is a lower triangular matrix and \(U\) is an upper triangular matrix.
➩ For example, if \(A\) is 3 by 3, then \(\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix}\).
➩ Sometimes, a permutation matrix is required to reorder the rows of \(A\), so \(P A = L U\) is used, where \(P\) is a permutation matrix (reordering of the rows of the identity matrix \(I\)).
scipy.linalg.lu(A) can be used to find \(P, L, U\) matrices: Doc.

📗 LU Decomposition Solve
➩ Solving \(A w = b\) and \(A w = c\) involves computing the same LU decomposition for \(A\) twice.
➩ It is faster to compute the LU decomposition once and then solve using the LU matrices instead of \(A\).
scipy.linalg.lu_factor(A) can be used to find the \(L, U\) matrices: Doc.
scipy.linalg.lu_solve((lu, p), b) can be used to solve \(A w = b\) where lu is the LU decomposition and p is the permutation.

📗 Comparison for Solving Multiple Systems
➩ To solve \(A w = b\), \(A w = c\) for square invertible \(A\):

Method Procedure Speed comparison
1 inv(A) @ b then inv(A) @ c Slow
2 solve(A, b) then solve(A, c) Fast
3 lu, p = lu_factor(A) then lu_solve((lu, p), b) then lu_solve((lu, p), c) Faster


➩ When \(A = X^\top X\) and \(b = X^\top y\), solving \(A w = b\) leads to the solution to the linear regression problem. If the same features are used to make predictions for different prediction variables, it is faster to use lu_solve.

 Numerical Instability
➩ Division by a small number close to 0 may lead to inaccurate answers.
➩ Inverting or solving a matrix close to 0 could lead to inaccurate solutions too.
➩ A matrix being close to 0 is usually defined by its condition number, not determinant.
numpy.linalg.cond can be used find the condition number: Doc.
➩ Larger condition number means the solution can more inaccurate.

TopHat Invertibility Discussion
➩ Code to invert matrices: Notebook.
➩ Discuss what should be the solution and why Python computes it incorrectly.

📗 Multicollinearity
➩ In linear regression, large condition number of the design matrix is related to multicollinearity.
➩ Multicollinearity occurs when multiple features are highly linearly correlated.
➩ One simple rule of thumb is that the regression has multicollinearity if the condition number of larger than 30.


 Notes and code adapted from the course taught by Yiyin Shen Link and Tyler Caraza-Harter Link






Last Updated: November 30, 2024 at 4:34 AM