CS 760: Homework 3
Bayesian Network Learning


Implement naive Bayes, TAN Bayes and Sparse Candidate algorithms for learning Bayesian networks. You may assume all features (variables) are binary, i.e., have values in {0,1}, including the class, and that the training and test sets will have no missing values. Consequently, you do not need to implement a full Bayes net inference algorithm, but can make predictions simply by computing the joint probability of each class value with the observed feature values and normalizing these results. Your program should read in the data in .arff format, then run the algorithm using the examples in the train set and test on examples in a test set. It should accept the filenames (minus the .arff extension) of train set and test set as command-line argument, with train set first. It should then print (via System.out.println) each algorithm's probabilities for the test set examples, as well as its performance on the test set examples using a threshold of 0.5 (which examples are correctly predicted and a test set contingency table). For Sparse Candidate, it should use a limit of 3 on the number of parents a node can have. For all methods, you should use pseudocounts of 1 in every CPT entry. You should test your method on your dataset from HW0, appropriately binarized. Use 80% of that data for training and 20% for testing. You should submit both your code and the datasets to the handin directory. You may assume that our test data sets will have at most 100 variables and at most 1000 data points.