Non-IID Reading Group (NIRG)
Fall 2010

Day/time: TUE 4:00pm - 5:00pm, every week
Location: CS 4310
Mailing list: sign up here
Organizer: Kwang-Sung Jun


In this reading group, we explore the problem of learning from data that are not independently and identically distrbuted (IID). IIDness is a common assumption made in statistical machine learning. Even though this assumption helps to study the properties of learning procedures (e.g. generalization ability), and also guides the building of new algorithms, there are many real world situations where it does not hold. We will discuss the following throughout our meetings.

However, we will mainly focus on theoretical and algorithmic part of two topics, online learning and active learning, because of increasing interest among machine learning and statistics community and its importance over many problems. In online learning, we have a stream of incoming examples, and the distribution of them may change adversarially over time: the examples are not identically distributed. Similarly, in active learning, labels for specific data are requested by the learner: the independence assumption is also violated. We'd like to tackle theoretical backgrounds over these topics to gain intuitions behind algorithms. Participants will be asked to read papers in advance and take turns leading the discussion. It's not required, and no pressure to lead - you can just come sit and discuss.


Fundamental concept of probability and statistics and knowledge on general machine learning framework. If you are familiar with SVM and duality, or Statistical learning theory, then it might be easier but these are not required.


An up-to-date schedule will be maintained as a Google Calendar (see below), but the order of topics will follow roughly what is mapped out below. Schedule is adjustable.

Online Learning

Active learning

Papers / Tutorials

References coming soon

Other useful references:

Learning from non-IID data: Theory, Algorithms and Practice, ECML 2009