Software


Original Work

First-Order Logic LDA (Fold-all)

A variant of Latent Dirichlet Allocation which allows the user to specify general domain knowledge in first-order logic, combining ideas from topic modeling and Markov Logic Network (MLN) research.
(pdf, slides, poster, code)

Dirichlet Forest LDA

A variant of Latent Dirichlet Allocation which uses a novel Dirichlet Forest prior on the topic-word multinomials. This prior allows the user to express "must-link" and "cannot-link" constraints between pairs of words in order to guide topic recovery.
(pdf, slides, code)

DeltaLDA

A variant of Latent Dirichlet Allocation which allows the use of "shared" topics common to all documents as well as "exclusive" topics which appear only in special documents.
(pdf, slides, code)

z-label LDA

A variant of Latent Dirichlet Allocation which allows the use of (possibly soft) observations of specific latent topic assignments (z-labels). Set labels are possible as well (eg, z must be in set C).
(pdf, code)

Parallel z-label LDA

Same as above, but using the parallel collapsed Gibbs sampler from "Distributed Algorithms for Topic Models" by Newman, D., Asuncion, A., Smyth, P., and Welling, M. (JMLR 2009). The parallelization offfers a significant speedup for on multi-core machines, and can be used to do standard LDA inference as well.
(code)

Other Code

CVB LDA

An implementation of collapsed variational Bayesian inference for standard LDA, from "A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation" by Teh Y.W., Newman D., and Welling, M (NIPS 2007). I implemented this algorithm in order to gain a better understanding of it.
(code)

Short Clojure programs

Some simple programs I made while learning a bit about Clojure.

  • k-nearest neighbors binary classifier (perhaps the "hello world" of machine learning?)
    (code, example data: train, test)
  • MaxWalkSAT weighted satisfiability solver (Kautz et al, 1997)
    (code, example data: clauses, parameters)
  • Simple lines-of-code calculator
    (code)