📗 Regular component (out of 5) should be submitted using the "Grade" and "Submit" buttons at the bottom of the page.
➩ Submission of the text file generated by the auto-grader to Canvas Assignment A3 is optional.
➩ Due date: August 9, no submission after that will be accepted.
📗 Competition component (out of 5) text file generated using Question 9 "Generate" button should be submitted to the Canvas Assignment A3C: Link
➩ Submission of an incorrectly formatted text file and any additional files to A3C will result in a competition score of \(-\infty\).
➩ Due date: July 7, no submission after that will be accepted under any circumstances.
📗 Note: Canvas A3 and A3C due date is the recommended due date, early submissions of competitions before the recommended due date will participate in trial competitions with the option to keep the score (not ranking).
📗 Hint: example submissions, discussion session schedules, and group recommendations (very different for different assignments) can be found on Piazza: Link.
📗 Enter your ID (the wisc email ID without @wisc.edu) here: and click (or hit the "Enter" key) 1,2,3,4,5,6,7,8,9,10a35
📗 You can also load from your saved file and click .
📗 If the questions are not generated correctly, try refresh the page using the button at the top left corner.
📗 The same ID should generate the same set of questions. Your answers are not saved when you close the browser. You could either copy and paste or load your program outputs into the text boxes for individual questions or print all your outputs to a single text file and load it using the button at the bottom of the page.
📗 Please do not refresh the page: your answers will not be saved.
📗 You can write the code in any programming language and using any large language models. You do not have to submit your code.
📗 (Introduction) In this project, you will build a decision tree to diagnose whether has patient has cancer based on their medical test results. In particular, we will use the Wisconsin Breast Cancer dataset Link. Your model will read in integer-valued patient data and output a diagnosis of whether the patient has cancer (2 for no, 4 for yes).
📗 (Part 1) Train a binary decision tree with a subset of the features and items that can classify all items correctly.
The features to use:
📗 (Part 2) Train the tree on the complete data set and all features but prune the tree based on some validation set. 1,2,3,4,5,6,7,8,9,10
Test set for Part 1 and Part 2 (this is a subset of the training data):
Here is a simple example of how to format the trees, your submission need to follow exactly the same format in order for it to be parsed correctly:
You can put in your tree and visualize it:
question,question,joy,question,cry
📗 (Competition) Submit your pruned tree and a list of patients that it can classify correctly but you believe other students' trees cannot classify correctly: these items will be part of the test set. The patient list should be a list of at most ten integers specifying the patients' IDs (or "sample_code_number" column from this dataset: Link). Your score will be based on the size of your tree (number of nodes, the smaller the better) and the loss from misdiagnosis.
Suppose you are in a team \(t \in \left\{0, 1, 2, 3, 4, 5\right\}\) with \(n\) unique items submitted, and you use a tree with \(m\) nodes (including leaf and internal nodes), and on the test set, you classify a patient with \(y_{i} \in \left\{2, 4\right\}\) by \(\hat{y}_{i} \in \left\{2, 4\right\}\), then your score is given by (everyone's score will be negative),
➩ Misdiagnosing cancer patients as no cancer will be twice as costly, so you should plan the item weights and pruning strategy accordingly.
➩ The cost of a larger tree depends on the number of test items, which is unknown before the competition: it depends on students' unique test item submissions.
➩ If the team wants \(n\) to be large, submit unique test items. In this case, \(m\) can be relatively larger too.
➩ If the team wants \(n\) to be small, submit repeated or no test items. In this case, \(m\) should be small too.
➩ Strategic consideration: larger teams (if members trust each other) can submit small trees and no test items; smaller teams can "defend" against outsiders and course staff's test items by submitting unique test items.
Your project grade is based on your submission to this assignment (out of 5) plus your ranking in the class (out of 5):
Top 20% gets 5/5.
Next 20% gets 4/5.
Next 20% gets 3/5.
Next 20% gets 2/5.
Next 20% gets 1/5.
(The students who do not participate in the competition will be given scores of negative infinities when computing the rankings).
📗 [1 points] Enter the total number of positive and negative instances in the training set (two integers, comma-separated, in the order, benign, malignant).
📗 [1 points] For the decision stump, enter the number of positive and negative instances in the training set above and below the threshold (four integers, comma-separated, in the order: below-benign, above-benign, below-malignant, above-malignant).
📗 [5 points] Input the binary decision tree in the format described previously.
Now you can use your tree to classify the following patient:
Feature vector (10 numbers, comma separated):
OR (make sure you leave the above text field blank):
1. Sample code number:
2. Clump Thickness:
3. Uniformity of Cell Size:
4. Uniformity of Cell Shape:
5. Marginal Adhesion:
6. Single Epithelial Cell Size:
7. Bare Nuclei:
8. Bland Chromatin:
9. Normal Nucleoli:
10. Mitoses:
Label: ?.
Corresponding feature vector: .
📗 [5 points] Input the pruned binary decision tree in the format described previously.
Now you can use your pruned tree to classify the following patient:
Feature vector (10 numbers, comma separated):
OR (make sure you leave the above text field blank):
1. Sample code number:
2. Clump Thickness:
3. Uniformity of Cell Size:
4. Uniformity of Cell Shape:
5. Marginal Adhesion:
6. Single Epithelial Cell Size:
7. Bare Nuclei:
8. Bland Chromatin:
9. Normal Nucleoli:
10. Mitoses:
Label: ?.
Corresponding feature vector: .
📗 [1 points] Please list the AI tools and references you used and the names of other students and course staff you discussed the assignment or competition with. Please also enter any comments and suggestions including possible mistakes and bugs with the questions and the auto-grading. If you completed the assignment without any help (not recommended), please enter "None" and do not leave this question blank.
📗 Please do not modify the content in the above text field: use the "Grade" button to update.
📗 You could submit multiple times (but please do not submit too often): only the latest submission will be counted.
📗 Please also save the text in the above text box to a file using the button or copy and paste it into a file yourself .
📗 You could load your answers from the text (or txt file) in the text box below using the button . The first two lines should be "##a: 3" and "##id: your id", and the format of the remaining lines should be "##1: your answer to question 1" newline "##2: your answer to question 2", etc. Please make sure that your answers are loaded correctly before submitting them.
📗 Saving and loading may take around 5 to 10 seconds. Please be patient and do not click "Load" multiple times.
📗 Presentations and interviews are optional for the competitions.
📗 If your competition grade is 2, 3, or 4, you can book an interview with the TA for 15 to 30 minutes.
📗 Interviews can only be booked during discussion sessions on Zoom (either during the current discussion session or for a future date and time): Link. Please do not email/spam the TA.
📗 A maximum of 3 interviews can be booked per person, and in the case you need 1 point for the next letter grade, we will allow a 4th one after the final exam.
📗 During the interviews, you will give a 5 to 10 minutes presentation to explain anything you did on the project that is creative or technically challenging. Then you will answer three technical questions about your presentation or any materials related to the assignment.
➩ If you answer any one of the three questions incorrectly, you will get \(-1\).
➩ If you answer all questions correctly, and if your presentation ideas are correct, interesting, consistent with your submissions, and not done by many other students (we will make the decision after all interviews are done), you will get \(+1\).