Machine Teaching
If machine learning is to discover knowledge, then machine teaching is to pass it on.
Machine teaching is an inverse problem to machine learning. Given a learning algorithm and a target model, machine teaching finds an optimal (e.g. the smallest) training set.
For example, consider a "student" who runs
the Support Vector Machine learning algorithm. Imagine a teacher who wants to teach the student a specific target hyperplane in some feature space (never mind how the
teacher got this hyperplane in the first place). The teacher constructs a training set D=(x1,y1) ... (xn, yn), where xi is a feature vector and yi a class label, to
train the student. What is the smallest training set that will make the student learn the target hyperplane? It is not hard to see that n=2 is sufficient with the two
training items straddling the target hyperplane. Machine teaching mathematically formalizes this idea and generalizes it to many kinds of learning algorithms and
teaching targets. Solving the machine teaching problem in general can be intricate and is an open mathematical question, though for a large family of learners the
resulting bilevel optimization problem can be approximated.
Machine teaching can have impacts in education, where the "student" is really a human student, and the teacher certainly has a target model (i.e. the educational
goal). If we are willing to assume a cognitive learning model of the student, we can use machine teaching to reverseengineer the optimal training data  which will
be the optimal, personalized lesson for that student. We have shown feasibility in a preliminary cognitive study to teach categorization.
Another application is in computer security where the "teacher" is an attacker and the learner is any intelligent system that adapts to inputs.
This page contains our research on the theory, algorithms, and applications of machine teaching.
Publications
Tutorials

Xiaojin Zhu, Adish Singla, Sandra Zilles, Anna N. Rafferty.
An Overview of Machine Teaching.
ArXiv 1801.05927, 2018.

Xiaojin Zhu.
Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education.
In The TwentyNinth AAAI Conference on Artificial Intelligence (AAAI ``Blue Sky'' Senior Member Presentation Track), 2015.
AAAI / Computing Community Consortium "Blue Sky Ideas" Track Prize.
An overview of machine teaching.
[pdf 
talk slides]
Theory of Machine Teaching

Yuzhe Ma, Robert Nowak, Philippe Rigollet, Xuezhou Zhang, and Xiaojin Zhu.
Teacher improves learning by selecting a training subset.
In The 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.
[pdf  AMPL code]

Xiaojin Zhu, Ji Liu, and Manuel Lopes.
No learner left behind: On the complexity of teaching multiple learners simultaneously.
In The 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017.
Minimax teaching dimension to make the worst learner in a class learn.
Partitioning the class into sections improves teaching dimension.
[pdf]

Ji Liu and Xiaojin Zhu.
The teaching dimension of linear learners.
Journal of Machine Learning Research, 17(162):125, 2016.
This is the journal version of the ICML'16 paper, with a discussion on teacherlearner collusion.
[link]

Ji Liu, Xiaojin Zhu, and H. Gorune Ohannessian.
The Teaching Dimension of Linear Learners.
In The 33rd International Conference on Machine Learning (ICML), 2016.
We provide lower bounds on training set size to perfectly teach a linear learning.
We also provide the corresponding upper bounds (and thus teaching dimension) by exhibiting teaching sets for SVM, logistic regression, and ridge regression.
[pdf  supplementary  arXiv preprint]

Xuezhou Zhang, Hrag Gorune Ohannessian, Ayon Sen, Scott Alfeld and Xiaojin Zhu.
Optimal Teaching for Online Perceptrons.
In NIPS 2016 workshop on Constructive Machine Learning, 2016.
[pdf]

Xiaojin Zhu.
Machine teaching for Bayesian learners in the exponential family.
In Advances in Neural Information Processing Systems (NIPS), 2013.
We study machine teaching, or optimal teaching, the inverse problem of machine learning.
[pdf  poster]
Applications in Security, Trustworthy, and Interpretable AI

Xuezhou Zhang, Xiaojin Zhu, and Stephen Wright.
Training set debugging using trusted items.
In The ThirtySecond AAAI Conference on Artificial Intelligence (AAAI), 2018
[pdf]

Scott Alfeld, Xiaojin Zhu, and Paul Barford.
Explicit defense actions against testset attacks.
In The ThirtyFirst AAAI Conference on Artificial Intelligence (AAAI), 2017.
[pdf]

Scott Alfeld, Xiaojin Zhu, and Paul Barford.
Data Poisoning Attacks against Autoregressive Models.
In The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016.
[pdf]

Gabriel Cadamuro, Ran GiladBachrach, and Xiaojin Zhu.
Debugging machine learning models.
In ICML Workshop on Reliable Machine Learning in the Wild, 2016.
Training data repair to ensure certain test items are correctly predicted.
An application of machine teaching.
[pdf  extended abstract for
CHI 2016 workshop on human centred machine learning]

Shike Mei and Xiaojin Zhu.
The security of latent Dirichlet allocation.
In The Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
How might an attacker poison the corpus to manipulate LDA topics? We answer this question via machine teaching.
[pdf]

Shike Mei and Xiaojin Zhu.
Using Machine Teaching to Identify Optimal TrainingSet Attacks on Machine Learners.
In The TwentyNinth AAAI Conference on Artificial Intelligence (AAAI15), 2015.
An application of machine teaching to identify the optimal trainingsetattacks against a learning algorithm.
[pdf
 poster ad
 poster
 Mendota ice data
 Tech Report 1813]

Shike Mei and Xiaojin Zhu.
Some Submodular DataPoisoning Attacks on Machine Learners.
Computer Science Tech Report 1822, University of WisconsinMadison, 2015.
[pdf]
Applications in Human Computer Interaction

Jina Suh, Xiaojin Zhu, and Saleema Amershi.
The label complexity of mixedinitiative classifier training.
In The 33rd International Conference on Machine Learning (ICML), 2016.
Do you do interactive machine learning with a human oracle? Then don't use active learning alone: mixing it with machine teaching is far better both in theory and in practice.
[pdf  supplementary]

Christopher Meek, Patrice Y. Simard, and Xiaojin Zhu.
Analysis of a design pattern for teaching with features and labels, 2016.
arXiv
Applications in Cognitive Psychology and Education

Robert M. Nosofsky, Craig A. Sanders, Xiaojin Zhu, and Mark A. McDaniel.
Modelguided search for optimal naturalsciencecategory training exemplars: A work in progress.
Psychonomic Bulletin & Review, 2018.

Ayon Sen, Purav Patel, Martina A. Rau, Blake Mason, Robert Nowak, Timothy T. Rogers, and Xiaojin Zhu.

For teaching perceptual fluency, machines beat human experts.
In The 40th Annual Conference of the Cognitive Science Society (CogSci), 2018.
[pdf]

Machine beats human at sequencing visuals for perceptualfluency practice.
In Educational Data Mining, 2018.
[pdf]

Kaustubh Patil, Xiaojin Zhu, Lukasz Kopec, and Bradley Love.
Optimal Teaching for LimitedCapacity Human Learners.
In Advances in Neural Information Processing Systems (NIPS), 2014.
Using machine teaching we construct an optimal training data set to teach human students a categorization task, assuming the students use GCM as the learning algorithm. Our optimal training data set is noniid, has interesting "idealization" properties, and outperforms iid training data sets sampled from the underlying test distribution.
[pdf 
poster 
spotlight 
data]

Faisal Khan, Xiaojin Zhu, and Bilge Mutlu.
How do humans teach: On curriculum learning and teaching dimension.
In Advances in Neural Information Processing Systems (NIPS) 25. 2011.
What is the optimal teaching strategy for a threshold function in 1D? Should one start teaching with items around the threshold? Or with the most unambiguous items farthest from the threshold? Two computational theories, teaching dimension and curriculum learning, disagree.
We show that humans do the latter.
We then extend teaching dimension theory to explain it.
[pdf  data  slides  UCSD teaching workshop talk]
Workshops
Talks
 Debugging the Machine Learning Pipeline at the Interpretable Machine Learning Symposium, NIPS 2017
 Introduction to Machine Teaching at the Workshop on Teaching Machines, Robots, and Humans, NIPS 2017
 Dagstuhl Seminar on Machine Learning and Formal Methods. August 2017, Germany. A Challenge in Machine Teaching.
 Talk at Simons Institute Workshop on Interactive Learning 2017, Berkely, CA. Machine Teaching in Interactive Learning
 Talk at ICML 2016 Workshop on Reliable Machine Learning in the Wild, New York: Machine Teaching and Security
 Talk at NIPS 2015 Workshop, Montreal, Canada: Machine Teaching for Personalized Education, Security, Interactive Machine Learning
 Talk at ICML 2015 Workshop on Machine Learning for Education, Lille, France: Machine Teaching.
Code and Data
 Poolbased machine teaching search algorithms through a filebased API: Version 0.1a0.
Poolmate provides a commandline interface to algorithms for
searching for teaching sets among a candidate pool. Poolmate is
designed to work with any learner which can be communicated with
through a filebased API.

Training set debugging using trusted items (AAAI 2018)
DUTI code
In the media

Squashing the bugs in machine learning: Researchers make computertrained models more trustworthy
Computer Sciences Department News, UWMadison.
By Jennifer Smith. March 8, 2018

"Machine Teaching" Is Seen as Way to Develop Personalized Curricula.
The Chronicle of Higher Education.
By Mary Ellen McIntire, August 12, 2015

In the Mind of a Student.
Inside Higher ED.
By Jacqueline Thomsen, September 25, 2015

Machine teaching holds the power to illuminate human learning.
University of WisconsinMadison News.
By Jennifer A. Smith, August 10, 2015

Machine Learning? No, Machine Teaching.
Science 2.0.
August 13, 2015

The Flipside of Machine Learning.
The R&D Magazine.
By Greg Watry, August 13, 2015
Back to Professor Zhu's home page