Non-IID Reading Group (NIRG)
Fall 2010
Day/time: TUE 4:00pm - 5:00pm, every week
Location: CS 4310
Mailing list: sign up
here
Organizer: Kwang-Sung Jun
Description:
In this reading group, we explore the problem of learning from data that are not
independently and identically distrbuted (IID). IIDness is a common assumption made in
statistical machine learning. Even though this assumption helps to study the properties of
learning procedures (e.g. generalization ability), and also guides the building of new
algorithms, there are many real world situations where it does not hold. We will discuss
the following throughout our meetings.
- Theoretical: results on generalization bounds and learnability, contributions that
mathematically formalize the types of non-IIDness encountered, results on the extent to
which non-IIDness does not harm the validity of theoretical results build on the IID
assumption, helpfulness of the online learning framework.
- Algorithmic: theoretically motivated algorithms designed to handle non-IID data,
approaches that make it possible for classical learning results to carry over, online
learning procedures.
- Practical: successful applications of non-IID learning methods to learning from
streaming data, web data, biological data, multimedia, natural language, social network
mining.
However, we will mainly focus on theoretical and algorithmic part of two topics, online
learning and active learning, because of increasing interest among machine learning and
statistics community and its importance over many problems. In online learning, we have a
stream of incoming examples, and the distribution of them may change adversarially over
time: the examples are not identically distributed. Similarly, in active learning, labels
for specific data are requested by the learner: the independence assumption is also
violated. We'd like to tackle theoretical backgrounds over these topics to gain intuitions
behind algorithms. Participants will be asked to read papers in advance and take turns
leading the discussion. It's not required, and no pressure to lead - you can just come sit
and discuss.
Prerequisites
Fundamental concept of probability and statistics and knowledge on
general machine learning framework. If you are familiar with SVM and duality, or
Statistical learning theory, then it might be easier but these are not required.
Schedule
An up-to-date schedule will be maintained as a Google
Calendar (see below), but the order of topics will follow roughly what is mapped out
below. Schedule is adjustable.
Online Learning
- Week 1 (09/27): Introduction of reading group. Introductory example of online learning.
Section 1 is covered today from here.
- Week 2 (10/04): Regret analysis, weighted average scheme. Potential. (Sec 2.1 in CBL06) Simulation code.
- Week 3 (10/11): Classification using randomization. (Read Ch 4.1 , 4.2 from CBL06)
Bregman divergence: introduction and properties (part 2 of SS07).
- Week 4 (10/18): Continue on Bregman Divergence. Bregman divergence: nonlinear case,
matching loss, duality (Read part 2 of SS07, video lecture, optional: read
part 1 of SS07)
- Week 5 (11/01): Continue on Bregman Divergence. Duality, Connection to exponential
family of distribution. (Read part 2 of SS07, video lecture, optional: read
part 1 of SS07)
- Week 12: Online SVM with dual, and its bound. General statements: Fenchel Duality,
conditions for low regret (Sufficient Dual Ascent, Strong Convexity, L-Lipschitz),
Directions of generalize
- Week 13: (Possibly postpone to after active learning if runs out of time)
Aggressive Dual Ascend Schemes, Passive-Aggressive, Entropic Regularization, Further
direction.
Active learning
- Week 6 (11/11):CAL Algorithm
(named after authors), Proof, Handout: Often credited
as being the first serious active learning algorithm and introduces the experimental
setup used by A^2, DHM, GBS, and others. The proof is a very gentle introduction into
proving bounds for active learning algorithms using the disagreement coefficient.
- "Two faces of active learning"
: A nice overview of active learning algorithms and presents the potential pitfalls
associated with active learning. Sets the stage nicely for analyzing different
algorithms.
- Week 7 (11/23):
DHM Algorithm (named after authors) Handout :Introduces a novel way to maintain hypothesis
space without ever having to actually keep track of it.
- Week 8 (12/10, Last Meeting): Noisy General
Binary Search (GBS) :Assumes same experimental setup as others but has different
assumptions on the classifiers and uses totally different techniques to prove bounds.
- A^2
Algorithm : A classic active learning algorithm that uses empirical risk PAC
bounds. Worth mentioning for historical context.
Papers / Tutorials
- CBL06: Prediction, Learning, and Games (ISBN: 0521841089 / 0-521-84108-9)
Cesa-Bianchi, Nicolo; Lugosi, Gabor
- Singer08: Theory &
Applications of Online Learning, Shalev-Shwartz and Singer, 08
- Warmuth06: Online
Learning and Bregman Divergences, Warmuth, 06 , Video
- SS07: Online
Learning: Theory, Algorithms, and Applications, Shalev-Shwartz and Singer,
2007,
- Settles08: Active
Learning Literature Survey, Settles, 10
- Langford09: A tutorial on active
learning, Dasgupta and Langford, 09
- CMU Lecture: Learning, Game Theory, and Optimization
References coming soon
Other useful references:
Learning from non-IID data:
Theory, Algorithms and Practice, ECML 2009
Acknowledgements