# Lecture Notes
📗 Linear Programming
➩ A linear program is an optimization problem in which the objective is linear and the constraints are linear.
➩ The set of feasible solutions are the ones that satisfy the constraints, but may or may not be optimal.
➩ The feasible solutions of linear programs form a convex polytope (line segment in 1D, convex polygon in 2D).
TopHat Discussion
➩ The problem can be written as: \(\displaystyle\max_{w, b} 4 w + 3 b\) subject to \(w + b \leq 10\), and \(3 w + 6 b \leq 48\), and \(4 w + 2 b \leq 32\), with \(w, b \geq 0\).
➩ In matrix form: \(\displaystyle\max_{w, b} \begin{bmatrix} 4 \\ 3 \end{bmatrix} ^\top \begin{bmatrix} w \\ b \end{bmatrix}\) subject to \(\begin{bmatrix} 1 & 1 \\ 3 & 6 \\ 4 & 2 \end{bmatrix} \begin{bmatrix} w \\ b \end{bmatrix} \leq \begin{bmatrix} 10 \\ 48 \\ 32 \end{bmatrix}\), with \(\begin{bmatrix} w \\ b \end{bmatrix} \geq 0\).
➩ An equivalent problem (called the dual) is: \(\displaystyle\min_{l, f, p} 10 l + 48 f + 32 p\) subject to \(l + 3 f + 4 p \geq 4\), and \(l + 6 f + 2 p \geq 3\), with \(l, f, p \geq 0\).
➩ In matrix form: \(\displaystyle\min_{l, f, p} \begin{bmatrix} 10 \\ 48 \\ 32 \end{bmatrix} ^\top \begin{bmatrix} l \\ f \\ p \end{bmatrix}\) subject to \(\begin{bmatrix} 1 & 1 \\ 3 & 6 \\ 4 & 2 \end{bmatrix} ^\top \begin{bmatrix} l \\ f \\ p \end{bmatrix} \geq \begin{bmatrix} 4 \\ 3 \end{bmatrix}\), with \(\begin{bmatrix} l \\ f \\ p \end{bmatrix} \geq 0\).
➩ The dual of the revenue maximization problem is the expense minimization problem.
➩ Code for visualizing and solving the linear program:
Notebook.
Standard Form
➩ The standard form of a linear program is: \(\displaystyle\max_{x} c^\top x\) subject to \(A x \leq b\) and \(x \geq 0\).
➩ For example, if there are two variables and two constraints, then the standard form is: \(\displaystyle\max_{x_{1}, x_{2}} c_{1} x_{1} + c_{2} x_{2}\) subject to \(A_{11} x_{1} + A_{12} x_{2} \leq b_{1}\), \(A_{21} x_{1} + A_{22} x_{2} \leq b_{2}\) and \(x_{1}, x_{2} \geq 0\), which in matrix form, it is \(\displaystyle\max_{\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}} \begin{bmatrix} c_{1} & c_{2} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) subject to \(\begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \leq \begin{bmatrix} b_{1} \\ b_{2} \end{bmatrix}\) and \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \geq \begin{bmatrix} 0 \\ 0 \end{bmatrix}\).
📗 Dual Form
➩ The dual form of a linear program in standard form is: \(\displaystyle\min b^\top y\) subject to \(A^\top y \geq c\) and \(y \geq 0\).
➩ For example, the dual form of the two variable is, \(\displaystyle\min_{y_{1}, y_{2}} b_{1} y_{1} + b_{2} y_{2}\) subject to \(A_{11} y_{1} + A_{21} y_{2} \geq c_{1}\), \(A_{12} y_{1} + A_{22} y_{2} \geq c_{2}\) and \(y_{1}, y_{2} \geq 0\), which in matrix form, it is \(\displaystyle\max_{\begin{bmatrix} y_{1} \\ y_{2} \end{bmatrix}} \begin{bmatrix} b_{1} & b_{2} \end{bmatrix} \begin{bmatrix} y_{1} \\ y_{2} \end{bmatrix}\) subject to \(\begin{bmatrix} A_{11} & A_{21} \\ A_{12} & A_{22} \end{bmatrix} \begin{bmatrix} y_{1} \\ y_{2} \end{bmatrix} \geq \begin{bmatrix} c_{1} \\ c_{2} \end{bmatrix}\) and \(\begin{bmatrix} y_{1} \\ y_{2} \end{bmatrix} \geq \begin{bmatrix} 0 \\ 0 \end{bmatrix}\).
➩ Duality theorem says that the primal and dual solutions lead to the same objective value.
📗 Simplex Method
➩
scipy.optimize.linprog
provides two methods of solving a linear program:
Doc
(1) Simplex method
simplex
(deprecated, replaced by
highs
): since at least one of the optimal solutions is a vertex of the feasible solutions polytope, moving between vertices will lead to an optimal solution.
(2) Interior point method
interior-point
(deprecated, replaced by
highs-ipm
): the constraints can be included as a cost in the optimization (when it is satisfied, the objective is also optimized), then the problem becomes an unconstrained optimization and can be solved using gradient type algorithms.
Application in Regression
➩ The problem of regression by minimizing the sum of absolute value of the error (instead of the square), that is, \(C\left(w\right) = \displaystyle\min_{w} \displaystyle\sum_{i=1}^{n} \left| y_{i} - \left(w_{1} x_{i1} + w_{2} x_{i2} + ... + w_{m} x_{im} + b\right) \right|\), can be written as a linear program.
➩ The can be done by noting \(\left| a \right| = \displaystyle\max\left\{a, -a\right\}\), so the problem can be written as, \(\displaystyle\min_{a, w, b} \displaystyle\sum_{i=1}^{n} a_{i}\) subject to \(a_{i} \geq y_{i} - \left(w_{1} x_{i1} + w_{2} x_{i2} + ... + w_{m} x_{im} + b\right)\) and \(a_{i} \geq -\left(y_{i} - \left(w_{1} x_{i1} + w_{2} x_{i2} + ... + w_{m} x_{im} + b\right)\right)\).
➩ The resulting regression problem is also called Least Absolute Deviations (LAD):
Link.
📗 Application to Classification
➩ The problem of finding the optimal weights for a support vector machine can be written as minimizing the hinge loss \(C\left(w\right) = \displaystyle\sum_{i=1}^{n} \displaystyle\max\left\{1 - \left(2 y_{i} - 1\right) \cdot \left(w_{1} x_{i1} + w_{2} x_{i2} + ... + w_{m} x_{im} + b\right), 0\right\}\), which can be converted into a linear program.
➩ Similar to the regression problem, the problem can be written as, \(\displaystyle\min_{a, w, b} \displaystyle\sum_{i=1}^{n} a_{i}\) subject to \(a_{i} \geq 1 - \left(w_{1} x_{i1} + w_{2} x_{i2} + ... + w_{m} x_{im} + b\right)\) if \(y_{i} = 1\), and \(a_{i} \geq 1 + \left(w_{1} x_{i1} + w_{2} x_{i2} + ... + w_{m} x_{im} + b\right)\) if \(y_{i} = 0\), and \(a_{i} \geq 0\).
➩ When kernel trick is used, the dual of the linear program with the new features, possibly infinite number of them, is finite and can be solved instead.
Robust Regression Example
➩ Compare least squares and least absolute deviation solutions for the regression problem.
Additional Examples
➩ Convert the linear program \(\displaystyle\min w_{1} - w_{2}\) subject to \(w_{1} + w_{2} \leq 1\) and \(w_{1} + w_{3} \geq 2\) and \(w_{1} - w_{2} + w_{3} = 0\) with
into the standard form.
➩ The standard form is \(\displaystyle\max c^\top w\) subject to \(A w \leq b\) and \(w \geq 0\).
➩ Since \(w = \begin{bmatrix} w_{1} \\ w_{2} \\ w_{3} \end{bmatrix}\), the linear program objective should be rewritten as \(\displaystyle\max -1 w_{1} + 1 w_{2} + 0 w_{3}\) so \(c = \begin{bmatrix} -1 \\ 1 \\ 0 \end{bmatrix}\).
➩ The first constraint is in the correct form \(1 w_{1} + 1 w_{2} + 0 w_{3} \leq 1\), the second constraint should be rewritten as \(-1 w_{1} + 0 w_{2} - 1 w_{3} \leq -2\), and the last constraint should be split into two inequality constraints, \(1 w_{1} - 1 w_{2} + 1 w_{3} \leq 0\) and \(-1 w_{1} + 1 w_{2} - 1 w_{3} \leq 0\), so \(A = \begin{bmatrix} 1 & 1 & 0 \\ -1 & 0 & -1 \\ 1 & -1 & 1 \\ -1 & 1 & -1 \end{bmatrix}\) and \(b = \begin{bmatrix} 1 \\ -2 \\ 0 \\ 0 \end{bmatrix}\).
test 1 q
➩ Farmer sells wheat at 4 dollars and barley at 3 dollars, one unit of wheat uses 3 units of fertilizer and 4 units of pesticide, and one unit of barley uses 6 units of fertilizer and 2 units of pesticide. The farmer has 10 units of land, 48 units of fertilizer and 32 units of pesticide, and has to decide how much land to use for wheat and barley.
Land:
Yes, Fertilizer:
Yes, Pesticide:
Yes.
Wheat:
0, Barley:
0, Revenue:
0.
Notes and code adapted from the course taught by Yiyin Shen
Link and Tyler Caraza-Harter
Link