📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10,11m4
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could print the page: , solve the problems, then enter all your answers at the end.
📗 Please do not refresh the page: your answers will not be saved.
📗 [3 points] A linear SVM (Support Vector Machine) has \(w\) = and \(b\) = . Which of the following points is predicted positive (label 1)?
Hint
See Fall 2014 Midterm Q12, Fall 2013 Final Q4, Spring 2017 Final Q2. The positively labeled points are the \(x\) such that \(w x + b \geq 0\).
📗 Choices:
None of the above
📗 Calculator: .
📗 [2 points] Given a weight vector \(w\) = , consider the line (plane) defined by \(w^\top x = c\) = . Along this line (on the plane), there is a point that is the closest to the origin. How far is that point to the origin in Euclidean distance?
📗 Note: the distance between the point and plane is the length of the red line in the diagram, the length of the blue line is \(\dfrac{c}{w_{z}}\), not the distance between the point and plane.
Hint
See Fall 2011 Midterm Q7. Use a similar approach to the derivation of the margin: suppose the vector from the origin to the point on the line is \(\lambda w\), then \(w^\top \lambda w = c\), which implies that \(\lambda = \dfrac{c}{w^\top w} = \dfrac{c}{\left\|w\right\|^{2}}\). The length of the vector \(\lambda w\) is the distance from the origin to the point on the line: \(\left\|\lambda w\right\| = \lambda \left\|w\right\| = \dfrac{c}{\left\|w\right\|^{2}} \left\|w\right\| = \dfrac{c}{\left\|w\right\|}\).
📗 Answer: .
📗 [2 points] Suppose an SVM (Support Vector Machine) has \(w\) = and \(b\) = . What is the actual distance between the two planes defined by \(w^\top x + b = -1\) and \(w^\top x + b = 1\).
📗 Note: the distance between the two planes is the length of the red line in the diagram, the blue line does not represent the distance between the planes. You may have to rotate the diagram to see.
Hint
See Fall 2014 Midterm Q14. The distance between the two planes is \(\dfrac{2}{\sqrt{w^\top w}}\).
📗 Answer: .
📗 [6 points] A linear SVM (Support Vector Machine) with with weights \(w_{1}, w_{2}, b\) is trained on the following data set: \(x_{1}\) = , \(y_{1}\) = and \(x_{2}\) = , \(y_{2}\) = . The attributes (i.e. features) are two dimensional \(\left(x_{i1}, x_{i2}\right)\) and the label \(y_{i}\) is binary. The classification rule is \(\hat{y}_{i} = 1_{\left\{w_{1} x_{i1} + w_{2} x_{i2} + b \geq 0\right\}}\). Assuming \(b\) = , what is \(\left(w_{1}, w_{2}\right)\) ? The drawing is not graded.
Hint
See Fall 2019 Final Q11, Fall 2006 Final Q15, Fall 2005 Final Q15. Draw the line (decision boundary). One way to figure out the equation of the is by noting that the line passes through the midpoint between \(x_{1}\) and \(x_{2}\), and the line (slope) is perpendicular to the line from \(x_{1}\) to \(x_{2}\). Make sure you have the correct signs: the class-1 point should satisfy \(w_{1} x_{i1} + w_{2} x_{i2} + b \geq 0\) and the class-0 point should satisfy \(w_{1} x_{i1} + w_{2} x_{i2} + b < 0\).
📗 Answer (comma separated vector): .
📗 [2 points] Consider a small dataset with \(n\) points, where each point is in a dimensional space. For which values of \(n\), there exists a dataset such that, no matter what binary label we give to each point, a linear SVM (Support Vector Machine) can perfectly classify the resulting dataset.
Hint
See Fall 2019 Final Q6, Fall 2010 Final Q14. The largest such \(n\) is called the Vapnik-Chervonenkis (VC) dimension. The VC dimension for linear classifiers (for example SVM) is the dimension of the space plus 1. The following is an example in 2D with 3 points: no matter what binary label we give to each point, a line can always separate the two classes: note that it is not the case with 4 points (remember the XOR example).
For this question, you can select all values less than or equal to the dimension of the space plus 1.
📗 Choices:
None of the above
📗 [4 points] Given the following training set, add one instance \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\) with \(y\) = so that all instances are support vectors for the Hard Margin SVM (Support Vector Machine) trained on the new training set.
\(x_{1}\)
\(x_{2}\)
\(y\)
0
0
0
1
1
1
📗 Note: in the diagram, currently, the two support vectors are connected by the grey line and the black line represents the SVM classification boundary. After adding one point, you should be able to make all seven points support vectors with the classification boundary given by the green line.
Hint
See Fall 2019 Final Q7 Q8. One example: say the negative points are \((-1, -1), \left(-2, -1\right), \left(-3, -1\right), ...\) and the positive points are \(\left(1, 1\right), \left(2, 1\right), \left(3, 1\right), ...\). Draw the points and draw the SVM decision boundary. Note that there are 2 support vectors. Suppose you add a point \((-1, 1)\). Draw the new point and draw the new SVM decision boundary. Note that now all points are support vectors.
📗 Answer (comma separated vector): .
📗 [4 points] Consider the linear SVM (Support Vector Machine) problem without slack variables or kernels: this is known as the hard margin SVM. If you give it a linearly separable training data set where \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \in \mathbb{R}^{2}\) and \(y \in \left\{0, 1\right\}\), it will learn a line in \(\mathbb{R}^{2}\). Tom did something to your data set, and hard margin SVM no longer works (no longer linearly separable) on the modified data set: \(\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} \leftarrow \begin{bmatrix} m_{11} & m_{12} \\ m_{21} & m_{22} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} + \begin{bmatrix} b_{1} \\ b_{2} \end{bmatrix} = M \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} + b\). Suppose \(b\) = , give an example of \(M\)?
📗 Note: you can test your transformation using : the original points are on the left and the new points after the transformation are on the right.
Hint
See Fall 2014 Final Q17. When the columns or rows of \(M\) are linearly dependent, some of the points will be mapped to the same line making the resulting dataset not linearly separable.
📗 Answer (matrix with multiple lines, each line is a comma separated vector): .
📗 [2 points] Let \(w\) = and \(b\) = . For the point \(x\) = , \(y\) = , what is the smallest slack value \(\xi\) for it to satisfy the margin constraint?
Hint
See Fall 2011 Midterm Q8, Fall 2009 Final Q1. There are two inequality constraints for the slack variable: (1) \(\left(2 y - 1\right)\left(w^\top x + b\right) \geq 1 - \xi\) and (2) \(\xi \geq 0\). Combine the two inequalities to get the smallest slack variable value.
📗 Answer: .
📗 [4 points] Consider a kernel \(K\left(x_{i_{1}}, x_{i_{2}}\right)\) = + + , where both \(x_{i_{1}}\) and \(x_{i_{2}}\) are 1D positive real numbers. What is the feature vector \(\varphi\left(x_{i}\right)\) induced by this kernel evaluated at \(x_{i}\) = ?
Hint
See Fall 2009 Final Q2. Write \(K\left(x, y\right)\) as the dot product of \(\varphi\left(x\right)\) and \(\varphi\left(y\right)\) (guess and check). Then substitute the value of \(x\) into the vector.
📗 Answer (comma separated vector): .
📗 [4 points] If \(K\left(x, x'\right)\) is a kernel with induced feature representation \(\varphi\left(x_{0}\right)\) = , and \(G\left(x, x'\right)\) is another kernel with induced feature representation \(\theta\left(x_{0}\right)\) = , then it is known that \(H\left(x, x'\right) = a K\left(x, x'\right) + b G\left(x, x'\right)\), \(a\) = , \(b\) = is also a kernel. What is the induced feature representation of \(H\) for this \(x_{0}\)?
Hint
See Fall 2014 Midterm Q15, Fall 2013 Final Q7, Fall 2011 Midterm 9. This requires guess and check: suppose the feature representation is \(\begin{bmatrix} \sqrt{a} \varphi\left(x\right) \\ \sqrt{b} \theta\left(x\right) \end{bmatrix}\), then \(H\left(x, x'\right) = \begin{bmatrix} \sqrt{a} \varphi\left(x\right) & \sqrt{b} \theta\left(x\right) \end{bmatrix} \begin{bmatrix} \sqrt{a} \varphi\left(x'\right) \\ \sqrt{b} \theta\left(x'\right) \end{bmatrix} = \sqrt{a} \sqrt{a} \varphi^\top\left(x\right) \varphi\left(x'\right) + \sqrt{b} \sqrt{b} \theta^\top\left(x\right) \theta\left(x'\right) = a K\left(x, x'\right) + b G\left(x, x'\right)\).
📗 Answer (comma separated vector): .
📗 [1 points] Please enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading, and materials relevant to solving the questions that you think are not covered well during the lectures. If you have no comments, please enter "None": do not leave it blank.
📗 You could save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##m: 4" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.