Literature and Datasets for the
Multiple Instance Problem
Papers
Theses
Datasets
Other related work
Please send corrections and suggestions for additional papers and data
to
sray_AT_cs_dot_wisc.edu or sray_AT_biostat_dot_wisc.edu . The list of
papers in particular is out of date and will be updated as I find time
or get email about specific papers that should be added.
Papers
- Solving
the Multiple-Instance Problem with Axis-Parallel.. - Dietterich,
Lathrop (1997)
- Introduces the MI problem in the context of drug activity
prediction
- Experiments with a variety of greedy APR-building procedures
to
solve the problem. Each conformation is described by 160 distance
features
wrt some predefined origin.
- www.ics.uci.edu/~rickl/publications/1997-aij.ps
- A
Framework for Multiple-Instance Learning - Maron, Lozano-Pérez
(1998)
- Introduces Diverse Density
- Image classification: identify if some person appears in an
image. Image=bag, instance=subimage (constant # per image)
- Stock prediction: identify stocks that are "fundamentally
good". Rising stocks=+ bags, Falling stocks=-bags, each stock
described by 17 feats.
- ftp.ai.mit.edu/pub/users/oded/papers/NIPS97.ps.Z
- Multiple-Instance
Learning for Natural Scene Classification - Maron, Ratan (1998)
- Classify Images by content, eg sunset, mountain, waterfall
- Separate image into subimages, each subimage becomes an
instance
- Idea: bag (image) is + iff at least one subimage is +
- Estimate P(t|Bij+) with exp(-||Bij-t||^2) - norm involves
scaling factor for each feature, learned from data
- Also experiments with disjunctive concepts
- ftp.ai.mit.edu/pub/users/oded/papers/CVPR98.ps.Z
- A Note on
Learning from Multiple-Instance Examples - Blum, Kalai (1998)
- Assuming independence between instances, gives reduction to
PAC
learning with 1 and 2 sided classification noise
- www.cs.cmu.edu/afs/cs.cmu.edu/user/avrim/www/Papers/multi_examples.ps.gz
- Approximating
Hyper-Rectangles: Learning and Pseudo-random Sets - Auer, Long, al.
(1997)
- www.iscs.nus.edu.sg/~plong/papers/rects.ps
- On
Learning from Multi-Instance Examples: Empirical Evaluation of a
theoretical approach- Auer (1997)
- Statistical method to get expected value of edge lengths for
APRs based on observed features in positive and negative bags
- Full paper above describes full algorithm to calculate a
quantity, V, needed by the alg
- www.cis.tu-graz.ac.at/igi/pauer/Online-pubs/icml97.ps.gz
- Pharmacophore
Discovery using the Inductive Logic Programming System PROGOL- Paul
Finn, Muggleton, Page, Srinivasan (1998)
- www.cs.wisc.edu/~dpage/mlj_page.ps
- Multiple
Instance Regression - Ray, Page (2001)
- MI learner for regression tasks with linear hypothesis
classes
- www.cs.wisc.edu/~dpage/mip.reg.ps
- 1BC: a
First-Order Bayesian Classifier - Flach, Lachiche (1999)
- Describes application of naive bayes to structured terms
- Mentions that structured entities like sets of tuples are
similar to the MI representation
- Describes probabilities over structured terms by naive
decomposition into probabilites over components (as specified by structural
predicates and properties)
- User defined limit on such recursive decomposition
- Further identifies relevant components as elementary
features : features with only one property. Probability over
unseen individuals (structured terms) can be calculated by collecting
stats about individuals with similar elementary features
- Tests on 3 ILP datasets
- www.cs.bris.ac.uk/~lachiche/publications/ilp99.ps.gz
- Image
Database Retrieval With Multiple-Instance Learning Techniques - Yang
(2000)
- www.ai.mit.edu/people/cheng/thesis/image-mi.ps.gz
- Solving
the Multiple-Instance Problem: A Lazy Learning Approach - Wang, al.
(2000)
- Use two variants of kNN extending the bag's label to
instances (call this "minimal Hausdorff distance")
- Using kNN and c-citers of a new point gives good results
- alexia.lis.uiuc.edu/~junwang4/Publications/ICML2000_MIP.ps.gz
- Decomposing
probability distributions on structured individuals - Flach, Lachiche
(2000)
- Proposes decompositions to calculate the probabilities
over structured terms in terms of their components and their "features"
(see 1BC paper)
- Describes decompositions over tuples, lists, sets, multisets
- Proposes two methods to find probabilities for sets of tuples
(the MI representation)
- No experiments for proposed decompositions
- www.cs.bris.ac.uk/Tools/Reports/Ps/flach-lachiche-ilp2000wip.ps.gz
- Learning
Structurally Indeterminate Clauses - Zucker, Ganascia (1998)
- Describes a language bias, "s-structural indeterminate
clauses", to restrict indeterminacy while learning definite clauses
- Still allows limited indeterminacy
- Transforms learning task to propositional, indeterminacy
causes
the MI problem, uses a propositional learner, CHARADE, to learn DNFs in
this representation
- Results for Chinese character dataset
- www-apa.lip6.fr/~ganascia/Archives_postcript/ILP98.ps
- Agnostic
Learning of Geometric Patterns - Goldman, Kwek, Scott (1997)
- www.cs.wustl.edu/cs/techreports/1998/wucs-98-27.ps.Z
- Content-Based
Image Retrieval Using Multiple-Instance Learning - Qi Zhang
- Experiments with various image transformations and DD
and EM-DD
- www.cs.wustl.edu/~sg/icml-cbir.ps
- EM-DD: An
Improved Multiple-Instance Learning Technique - Qi Zhang
- Uses EM to find the single most likely
positive point from each positive bag, and re-evaluate the DD objective
with these points
- Avoids a softmax, so simplifies searching the hypothesis space
- Reported results on MUSK are incorrect. Corrected results are
in Andrews et al's NIPS paper.
- www.cs.wustl.edu/~sg/emdd.ps
- Four
Suggestions and a Rule Concerning the Application of ILP - Ashwin
Srinivasan
- ftp.comlab.ox.ac.uk/pub/Packages/ILP/Papers/AS/ilpkdd.ps.gz
- Physically
grounding the lexical semantics of words in a.. - Bredeche, Chevaleyre
(2002)
- Robot anchoring lexicon to sensory data: Trained with tagged
images partitioned into subimages, anchoring done if some part of image
has selected object (responsible for label)
- www-poleia.lip6.fr/~bredeche/arob2002n.ps
- A
Framework for Learning Rules from Multiple Instance Data - Chevaleyre,
Zucker (2001)
- www-poleia.lip6.fr/~chevaley/pub/ecml01yc.pdf
- Noise-Tolerant
Rule induction from Multi-Instance data - Yann Chevaleyre, Zucker
- Introduces a MI "noise model", instances can be randomly
redrawn from the space, so bags may not correspond to labels
- Derives formula relating probability of rule covering true
instance given that it covers observed instance [rule is a disjunction
H(B)=OR (h(b_i))]
- Gets some improvement on musk data over non-noise-handling own
ILP-based MI learner, REPART.
- www-poleia.lip6.fr/~chevaley/pub/icml00workshp.pdf
- Automatic
Discovery of Subgoals in Reinforcement Learning.. - McGovern, Barto
(2001)
- Subgoal by searching for bottleneck: region in
observation space visited frequently on successful paths but not on
unsuccessful paths
- Positive bag: successful trajectory, instances are agent
observations. Negative bags= unsuccessful trajectories.
- Use DD to learn a subgoal (state) every episode, create
an option when running avg of DD for any state exceeds a threshold
- www.lunabots.com/pubs/mcgovern_barto_icml2001.ps.gz
- Multiple-Instance
Learning of Real-Valued Data - Amar, Dooly, Goldman, Zhang (2001)
- Learns to predict real values from MI data
- uses citation-kNN, modified with (learned) scale factors for
each feature and minimal Hausdorff distance
- uses DD-like algorithm, where Pr(t|B_i) is estimated by a
variant on a
normalized exponential regression function
- Effect of scale factors unclear, indicates that true scale
factors may hinder the search and may not have semantic importance
(performance can be comparable with different sets of features with
high scale factors)
- www.cs.wustl.edu/~sg/realmulti.ps
- Solving
multiple-instance and multiple-part learning problems with .. - Yann
- Introduces a "multiple part" problem, similar to the MI
problem
but each instance a part of the object being described.
- Discusses modifications to entropy, coverage functions
to account for multiple instances per bag
- www-poleia.lip6.fr/~zucker/Papers/RapportYAN.pdf
- An
Empirical Approach To Real-Valued Multiple-Instance.. - Amar (2000)
- www.cs.wustl.edu/cs/techreports/2000/wucs-00-23.pdf.Z
- Multiple-Instance
Learning of Real-Valued Geometric Patterns - Goldman, Scott (2000)
- www.cse.unl.edu/~sscott/research/papers/UNL-CSE-99-006.ps.gz
- A
Unifying View of Knowledge Representation for.. - Bowers,
Giraud-Carrier, .. (2000)
- cslab.anu.edu.au/~jwl/kr.ps.gz
- A Case Study
in
Machine Learning for Combinatorial Chemistry - Page, Curtis, Graham..
- www.cs.wisc.edu/~dpage/icis2.ps
- Geometric
Patterns: Algorithms and Applications - Scott (2000)
- www.cse.unl.edu/~sscott/research/papers/patts-overview-workshop.ps.gz
- Mining the
Web
for Object Recognition - Charles Rosenberg
- www.cs.cmu.edu/~chuck/workpg/wd-19991116.ps.gz
- Event
Prediction: Learning from Ambiguous Examples - Gary Weiss
- www.cs.rutgers.edu/~gweiss/papers/nips98.ps
- An
Inductive Logic Programming Framework to Learn a Concept .. -
Bouthinon, Soldano (1998)
- www-lipn.univ-paris13.fr/~soldano/ecml98.ps
- Multi
Instance Neural Networks - Ramon, De Raedt (2000)
- Simple representation of instance function by an ANN, bag
function defined as max over instance functions, each with identical ANN
- Fair results on Musk
- www.cs.kuleuven.ac.be/~janr/minn.ps
- Attribute Value Learning vs ILP: the
missing links (De Raedt, 2000)
- Proposes several representations of
intermediate complexity between AV and FO: MI, Multi-tuple, multi-join
- Provides reductions between these, but
argues (no proof) that these reductions are infeasible in practice, so
should solve these problems as is
- Argues that the MI representation is
the
fundamental transition between AV to FO, because reduction seems to be
hardest for this.
- Learning to
Rank Structured Alternatives: An.. - Costa, Frasconi..
- www.di.unito.it/~vincenzo/drafts/icmlWS.pdf
Top
Theses
- Learning
from Ambiguity - Maron (1998)
- ftp.ai.mit.edu/pub/users/oded/papers/maron-thesis.ps.Z
- Exploring
Applications of Learning Theory to Pattern Matching and.. - Scott (1998)
- www.cse.unl.edu/~sscott/research/abstracts/papers/diss.ps.gz
- Learning single and multiple instance
decision trees for computer security applications. Ruffo, G (2000)
- www.di.unito.it/~ruffo/Papers/tesi.pdf
Top
Other Related Work
(from Oded
Maron's old MI Page)
- Phoneme Recognition Using Time-Delay Neural
Networks, by Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton,
Kiyohiro Shikano, and Kevin J. Lang, in IEEE Transactions on Acoustics,
Speech, and Signal-Processing, 37:3, 1989.
- TDNN receives examples (in this paper, phonemes) where the
start and duration of the crucial section of the example are not
specified. The TDNN architecture is specifically designed to deal with
this ambiguity and learn a classifier which is time-invariant.
- Integrated Segmentation and Recognition of Hand-Printed
Numerals,
by James D. Keeler, David E. Rumelhart, and Wee-Kheng Leow, in Neural
Information Processing Systems 3, 1991.
- The ISR architecture is similar to TDNN in that the network is
designed to be invariant to some aspect of the examples. Specifically,
the network is given unsegmented images of handwritten characters. The
examples are ambiguous because the characters can appear in different
places in the image and even slightly overlap each other.
- Learning from Data with Bounded Inconsistency, by Haym Hirsh, in
International Conference on Machine Learning (ML-90), 1990.
- Bounded inconsistency is a model where an explicit bound is
given for the amount of noise in the features of the examples. Each
training example is then represented by a set of concepts that could
cover it or any of its de-noised neighbors. Hirsh uses incremental
version-space merging to find a concept that covers all the examples.
- Learning to Recognize Promoter Sequences in E. Coli by Modeling
Uncertainty in the Training Data, by Steve W. Norton, in Proceedings,
Twelfth National Conference on Artificial Intelligence (AAAI-94), 1994.
- Here the ambiguity is caused by an uncertain representation.
Norton uses background knowledge to give probabilities to the various
representations and pick the most likely one.
- Compass: A shape-based machine learning tool for drug design, by
Jain, A. N., Dietterich, T. G., Lathrop, R. H., Chapman, D., Critchlow,
R. E., Bauer, B. E., Webster, T. A., Lozano-Perez, T. in Computer-Aided
Molecular Design, 8 (6) 635--652, 1994.
- A comparison of dynamic reposing and tangent distance for
drug activity prediction, by Dietterich, T. G., Jain, A., Lathrop, R.,
Lozano-Perez, T. in Advances in Neural Information Processing Systems,
6. San Mateo, CA: Morgan Kaufmann. 216-223, 1994. Postscript preprint.
- These papers discuss Dynamic Reposing, a technique for
handling ambiguity in the drug design and discovery domains. An example
here is ambiguous because it is not known which conformation is the
correct one and it is also not known which pose to pick.
Top
MI
Classification Datasets
- MUSK
- Available at UC Irvine
repository
- Task: predict if molecule is musk-like or not
- Bag: molecule, positive bag: musky molecule
- Instance: conformation of molecule
- This problem is easier than "real" drug activity prediction
because the orientation problem has been solved.
- Image data used by Maron, Goldman, Andrews and others
- One set available from Sally Goldman
- One set available from Stuart
Andrews
- Task: classify images according to whether they contain object
of interest
- Bag: whole image positive bag: image with object/scene
- Instance: image segment
- Reinforcement Learning agent simulator
- Available from
Amy McGovern
- Task: classify a state as a subgoal or not
- Bag: path (state sequence), positive bag: path that reaches
goal
- Instance: state on path
- Chinese characters
- MI formulation arises via transformation of
ILP Background knowledge and "structural predicates")
- Text Categorization
- One set available from Stuart
Andrews
- BioCreative MI data
- Task: categorize a document
- Bag: whole document
- Instance: part of document, e.g. a paragraph in a full-text
article
- Protein classification
- Available from Qingping
Tao
- Task: classify a protein as belonging to TrX family or not
- Bag: Protein sequence
- Instance: (Features created from) overlapping windows of
subsequences
MI Regression Datasets
- Synthetic data from Sally Goldman
- Thermolysin inhibitors (coming soon)
- Thrombin inhibitors (coming soon)
Top
Prepared by Soumya Ray. Last
Update 9/26/2005.