Due Sunday, 04/02, 11:59 PM

- Consider the following training set with two boolean features and one continuous feature.

A | B | C | Class | |

Instance 1 | F | T | 120 | Benign |

Instance 2 | T | F | 1090 | Benign |

Instance 3 | T | T | 245 | Malignant |

Instance 4 | F | F | 589 | Malignant |

Instance 5 | T | T | 877 | Malignant |

(a) How much information about the class is gained by knowing whether or not the

value of feature C is less than 475?

(b) How much information about the class is gained by knowing whether or not the

value of features A and B are different?

2. Suppose we want to learn a k-nearest neighbor model with the following data set and we are using Leave One Out Cross Validation (LOOCV) to select k. What would LOOCV pick - k = 1 or k = 2 or k = 3. Use Manhattan distance for calculations.

Feature1 | Feature2 | Class | |

Instance 1 | 2 | 3 | Positive |

Instance 2 | 4 | 4 | Positive |

Instance 3 | 4 | 5 | Negative |

Instance 4 | 6 | 3 | Positive |

Instance 5 | 8 | 3 | Negative |

Instance 6 | 8 | 4 | Negative |

3. Suppose we wish to construct a Bayes Network for 3 features X, Y and Z using Sparse Candidate algorithm. We are given data from 100 independent experiments where each feature is binary and takes value T or F. Below is a table summarizing the observations of the experiment:

X | Y | Z | Count |

T | T | T | 36 |

T | T | F | 4 |

T | F | T | 2 |

T | F | F | 8 |

F | T | T | 9 |

F | T | F | 1 |

F | F | T | 8 |

F | F | F | 32 |

(a) Suppose we wish to compute a single candidate parent for Z. In the rst round of the sparse candidate algorithm, we compute the mutual information between Z and the other random variables.

i. Compute the mutual information between Z and X i.e. I(X, Z) based on the frequencies observed in the data.

ii. Compute the mutual information between Z and Y i.e. I(Y, Z) based on the frequencies observed in the data.

(b) Based on your observations in part (a), which feature should be selected as candidate parent for Z? Why?

(c) In the first round of the algorithm, suppose that we choose Y to be the parent of Z in our network, X to be the parent of Y, and that X remains parent less. Estimate the parameters of the current Bayes net, given the data.

4. Suppose you are given following instances in 2-D space.

X coordinate | Y coordinate |

12 | 4 |

3 | 18 |

6 | 11 |

5 | 5 |

Build the Kernel Matrix for the above dataset for each of these Kernels:

(a) Polynomial Kernel of degree 2.

(b) Polynomial Kernel of degree up to 2.

(c) RBF Kernel with .

5. Consider a concept class ‘C’, in which each concept is represented by a pair of circles centered at the origin (0,0). Let r be the radius of the inner circle and r+a (a is a positive number) be the radius of the outer circle. Each training instance is represented by two real-valued features x1 and x2 and a binary class label y ∈ {0,1}. The concept y = 1 predicts for instances outside the radius of the inner circle and inside the radius of the outer circle, and y = 0 otherwise. Show that C is PAC learnable.