Literature and Datasets for the Multiple Instance Problem

Literature and Datasets for the Multiple Instance Problem

Papers
Theses
Datasets
Other related work

Please send corrections and suggestions for additional papers and data to sray_AT_cs_dot_wisc.edu or sray_AT_biostat_dot_wisc.edu . The list of papers in particular is out of date and will be updated as I find time or get email about specific papers that should be added.

Papers

Solving the Multiple-Instance Problem with Axis-Parallel.. - Dietterich, Lathrop (1997)

Introduces the MI problem in the context of drug activity prediction
Experiments with a variety of greedy APR-building procedures to solve the problem. Each conformation is described by 160 distance features wrt some predefined origin.
www.ics.uci.edu/~rickl/publications/1997-aij.ps

A Framework for Multiple-Instance Learning - Maron, Lozano-Pérez (1998)

Introduces Diverse Density
Image classification: identify if some person appears in an image. Image=bag, instance=subimage (constant # per image)
Stock prediction: identify stocks that are "fundamentally good". Rising stocks=+ bags, Falling stocks=-bags, each stock described by 17 feats.
ftp.ai.mit.edu/pub/users/oded/papers/NIPS97.ps.Z

Multiple-Instance Learning for Natural Scene Classification - Maron, Ratan (1998)

Classify Images by content, eg sunset, mountain, waterfall
Separate image into subimages, each subimage becomes an instance
Idea: bag (image) is + iff at least one subimage is +
Estimate P(t|Bij+) with exp(-||Bij-t||^2) - norm involves scaling factor for each feature, learned from data
Also experiments with disjunctive concepts
ftp.ai.mit.edu/pub/users/oded/papers/CVPR98.ps.Z

A Note on Learning from Multiple-Instance Examples - Blum, Kalai (1998)

Assuming independence between instances, gives reduction to PAC learning with 1 and 2 sided classification noise
www.cs.cmu.edu/afs/cs.cmu.edu/user/avrim/www/Papers/multi_examples.ps.gz

Approximating Hyper-Rectangles: Learning and Pseudo-random Sets - Auer, Long, al. (1997)

www.iscs.nus.edu.sg/~plong/papers/rects.ps

On Learning from Multi-Instance Examples: Empirical Evaluation of a theoretical approach- Auer (1997)

Statistical method to get expected value of edge lengths for APRs based on observed features in positive and negative bags
Full paper above describes full algorithm to calculate a quantity, V, needed by the alg
www.cis.tu-graz.ac.at/igi/pauer/Online-pubs/icml97.ps.gz

Pharmacophore Discovery using the Inductive Logic Programming System PROGOL- Paul Finn, Muggleton, Page, Srinivasan (1998)

www.cs.wisc.edu/~dpage/mlj_page.ps

Multiple Instance Regression - Ray, Page (2001)

MI learner for regression tasks with linear hypothesis classes
www.cs.wisc.edu/~dpage/mip.reg.ps

1BC: a First-Order Bayesian Classifier - Flach, Lachiche (1999)

Describes application of naive bayes to structured terms
Mentions that structured entities like sets of tuples are similar to the MI representation
Describes probabilities over structured terms by naive decomposition into probabilites over components (as specified by structural predicates and properties)
User defined limit on such recursive decomposition
Further identifies relevant components as elementary features : features with only one property. Probability over unseen individuals (structured terms) can be calculated by collecting stats about individuals with similar elementary features
Tests on 3 ILP datasets
www.cs.bris.ac.uk/~lachiche/publications/ilp99.ps.gz

Image Database Retrieval With Multiple-Instance Learning Techniques - Yang (2000)

www.ai.mit.edu/people/cheng/thesis/image-mi.ps.gz

Solving the Multiple-Instance Problem: A Lazy Learning Approach - Wang, al. (2000)

Use two variants of kNN extending the bag's label to instances (call this "minimal Hausdorff distance")
Using kNN and c-citers of a new point gives good results
alexia.lis.uiuc.edu/~junwang4/Publications/ICML2000_MIP.ps.gz

Decomposing probability distributions on structured individuals - Flach, Lachiche (2000)

Proposes decompositions to calculate the probabilities over structured terms in terms of their components and their "features" (see 1BC paper)
Describes decompositions over tuples, lists, sets, multisets
Proposes two methods to find probabilities for sets of tuples (the MI representation)
No experiments for proposed decompositions
www.cs.bris.ac.uk/Tools/Reports/Ps/flach-lachiche-ilp2000wip.ps.gz

Learning Structurally Indeterminate Clauses - Zucker, Ganascia (1998)

Describes a language bias, "s-structural indeterminate clauses", to restrict indeterminacy while learning definite clauses
Still allows limited indeterminacy
Transforms learning task to propositional, indeterminacy causes the MI problem, uses a propositional learner, CHARADE, to learn DNFs in this representation
Results for Chinese character dataset
www-apa.lip6.fr/~ganascia/Archives_postcript/ILP98.ps

Agnostic Learning of Geometric Patterns - Goldman, Kwek, Scott (1997)

www.cs.wustl.edu/cs/techreports/1998/wucs-98-27.ps.Z

Content-Based Image Retrieval Using Multiple-Instance Learning - Qi Zhang

Experiments with various image transformations and DD and EM-DD
www.cs.wustl.edu/~sg/icml-cbir.ps

EM-DD: An Improved Multiple-Instance Learning Technique - Qi Zhang

Uses EM to find the single most likely positive point from each positive bag, and re-evaluate the DD objective with these points
Avoids a softmax, so simplifies searching the hypothesis space
Reported results on MUSK are incorrect. Corrected results are in Andrews et al's NIPS paper.
www.cs.wustl.edu/~sg/emdd.ps

Four Suggestions and a Rule Concerning the Application of ILP - Ashwin Srinivasan

ftp.comlab.ox.ac.uk/pub/Packages/ILP/Papers/AS/ilpkdd.ps.gz

Physically grounding the lexical semantics of words in a.. - Bredeche, Chevaleyre (2002)

Robot anchoring lexicon to sensory data: Trained with tagged images partitioned into subimages, anchoring done if some part of image has selected object (responsible for label)
www-poleia.lip6.fr/~bredeche/arob2002n.ps

A Framework for Learning Rules from Multiple Instance Data - Chevaleyre, Zucker (2001)

www-poleia.lip6.fr/~chevaley/pub/ecml01yc.pdf

Noise-Tolerant Rule induction from Multi-Instance data - Yann Chevaleyre, Zucker

Introduces a MI "noise model", instances can be randomly redrawn from the space, so bags may not correspond to labels
Derives formula relating probability of rule covering true instance given that it covers observed instance [rule is a disjunction H(B)=OR (h(b_i))]
Gets some improvement on musk data over non-noise-handling own ILP-based MI learner, REPART.
www-poleia.lip6.fr/~chevaley/pub/icml00workshp.pdf

Automatic Discovery of Subgoals in Reinforcement Learning.. - McGovern, Barto (2001)

Subgoal by searching for bottleneck: region in observation space visited frequently on successful paths but not on unsuccessful paths
Positive bag: successful trajectory, instances are agent observations. Negative bags= unsuccessful trajectories.
Use DD to learn a subgoal (state) every episode, create an option when running avg of DD for any state exceeds a threshold
www.lunabots.com/pubs/mcgovern_barto_icml2001.ps.gz

Multiple-Instance Learning of Real-Valued Data - Amar, Dooly, Goldman, Zhang (2001)

Learns to predict real values from MI data
uses citation-kNN, modified with (learned) scale factors for each feature and minimal Hausdorff distance
uses DD-like algorithm, where Pr(t|B_i) is estimated by a variant on a normalized exponential regression function
Effect of scale factors unclear, indicates that true scale factors may hinder the search and may not have semantic importance (performance can be comparable with different sets of features with high scale factors)
www.cs.wustl.edu/~sg/realmulti.ps

Solving multiple-instance and multiple-part learning problems with .. - Yann

Introduces a "multiple part" problem, similar to the MI problem but each instance a part of the object being described.
Discusses modifications to entropy, coverage functions to account for multiple instances per bag
www-poleia.lip6.fr/~zucker/Papers/RapportYAN.pdf

An Empirical Approach To Real-Valued Multiple-Instance.. - Amar (2000)

www.cs.wustl.edu/cs/techreports/2000/wucs-00-23.pdf.Z

Multiple-Instance Learning of Real-Valued Geometric Patterns - Goldman, Scott (2000)

www.cse.unl.edu/~sscott/research/papers/UNL-CSE-99-006.ps.gz

A Unifying View of Knowledge Representation for.. - Bowers, Giraud-Carrier, .. (2000)

cslab.anu.edu.au/~jwl/kr.ps.gz

A Case Study in Machine Learning for Combinatorial Chemistry - Page, Curtis, Graham..

www.cs.wisc.edu/~dpage/icis2.ps

Geometric Patterns: Algorithms and Applications - Scott (2000)

www.cse.unl.edu/~sscott/research/papers/patts-overview-workshop.ps.gz

Mining the Web for Object Recognition - Charles Rosenberg

www.cs.cmu.edu/~chuck/workpg/wd-19991116.ps.gz

Event Prediction: Learning from Ambiguous Examples - Gary Weiss

www.cs.rutgers.edu/~gweiss/papers/nips98.ps

An Inductive Logic Programming Framework to Learn a Concept .. - Bouthinon, Soldano (1998)

www-lipn.univ-paris13.fr/~soldano/ecml98.ps

Multi Instance Neural Networks - Ramon, De Raedt (2000)

Simple representation of instance function by an ANN, bag function defined as max over instance functions, each with identical ANN
Fair results on Musk
www.cs.kuleuven.ac.be/~janr/minn.ps

Attribute Value Learning vs ILP: the missing links (De Raedt, 2000)

Proposes several representations of intermediate complexity between AV and FO: MI, Multi-tuple, multi-join
Provides reductions between these, but argues (no proof) that these reductions are infeasible in practice, so should solve these problems as is
Argues that the MI representation is the fundamental transition between AV to FO, because reduction seems to be hardest for this.

Learning to Rank Structured Alternatives: An.. - Costa, Frasconi..

www.di.unito.it/~vincenzo/drafts/icmlWS.pdf

Top

Theses

Learning from Ambiguity - Maron (1998)

ftp.ai.mit.edu/pub/users/oded/papers/maron-thesis.ps.Z

Exploring Applications of Learning Theory to Pattern Matching and.. - Scott (1998)

www.cse.unl.edu/~sscott/research/abstracts/papers/diss.ps.gz

Learning single and multiple instance decision trees for computer security applications. Ruffo, G (2000)

www.di.unito.it/~ruffo/Papers/tesi.pdf

Top

Other Related Work (from Oded Maron's old MI Page)

Phoneme Recognition Using Time-Delay Neural Networks, by Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J. Lang, in IEEE Transactions on Acoustics, Speech, and Signal-Processing, 37:3, 1989.

TDNN receives examples (in this paper, phonemes) where the start and duration of the crucial section of the example are not specified. The TDNN architecture is specifically designed to deal with this ambiguity and learn a classifier which is time-invariant.

Integrated Segmentation and Recognition of Hand-Printed Numerals, by James D. Keeler, David E. Rumelhart, and Wee-Kheng Leow, in Neural Information Processing Systems 3, 1991.

The ISR architecture is similar to TDNN in that the network is designed to be invariant to some aspect of the examples. Specifically, the network is given unsegmented images of handwritten characters. The examples are ambiguous because the characters can appear in different places in the image and even slightly overlap each other.

Learning from Data with Bounded Inconsistency, by Haym Hirsh, in International Conference on Machine Learning (ML-90), 1990.

Bounded inconsistency is a model where an explicit bound is given for the amount of noise in the features of the examples. Each training example is then represented by a set of concepts that could cover it or any of its de-noised neighbors. Hirsh uses incremental version-space merging to find a concept that covers all the examples.

Learning to Recognize Promoter Sequences in E. Coli by Modeling Uncertainty in the Training Data, by Steve W. Norton, in Proceedings, Twelfth National Conference on Artificial Intelligence (AAAI-94), 1994.

Here the ambiguity is caused by an uncertain representation. Norton uses background knowledge to give probabilities to the various representations and pick the most likely one.

Compass: A shape-based machine learning tool for drug design, by Jain, A. N., Dietterich, T. G., Lathrop, R. H., Chapman, D., Critchlow, R. E., Bauer, B. E., Webster, T. A., Lozano-Perez, T. in Computer-Aided Molecular Design, 8 (6) 635--652, 1994.
A comparison of dynamic reposing and tangent distance for drug activity prediction, by Dietterich, T. G., Jain, A., Lathrop, R., Lozano-Perez, T. in Advances in Neural Information Processing Systems, 6. San Mateo, CA: Morgan Kaufmann. 216-223, 1994. Postscript preprint.

These papers discuss Dynamic Reposing, a technique for handling ambiguity in the drug design and discovery domains. An example here is ambiguous because it is not known which conformation is the correct one and it is also not known which pose to pick.

Top

MI Classification Datasets

MUSK

Available at UC Irvine repository
Task: predict if molecule is musk-like or not
Bag: molecule, positive bag: musky molecule
Instance: conformation of molecule
This problem is easier than "real" drug activity prediction because the orientation problem has been solved.

Image data used by Maron, Goldman, Andrews and others

One set available from Sally Goldman
One set available from Stuart Andrews
Task: classify images according to whether they contain object of interest
Bag: whole image positive bag: image with object/scene
Instance: image segment

Reinforcement Learning agent simulator

Available from Amy McGovern
Task: classify a state as a subgoal or not
Bag: path (state sequence), positive bag: path that reaches goal
Instance: state on path

Chinese characters

MI formulation arises via transformation of ILP Background knowledge and "structural predicates")

Text Categorization

One set available from Stuart Andrews
BioCreative MI data
Task: categorize a document
Bag: whole document
Instance: part of document, e.g. a paragraph in a full-text article

Protein classification

Available from Qingping Tao
Task: classify a protein as belonging to TrX family or not
Bag: Protein sequence
Instance: (Features created from) overlapping windows of subsequences

MI Regression Datasets

Synthetic data from Sally Goldman
Thermolysin inhibitors (coming soon)
Thrombin inhibitors (coming soon)

Top

Prepared by Soumya Ray. Last Update 9/26/2005.