Delta Latent Dirichlet Allocation

graphical model toy example

Overview

This software implements the DeltaLDA model [1] for discrete count data. DeltaLDA is a modification of the Latent Dirichlet Allocation (LDA) model [2] which uses two different topic mixing weight priors to jointly model two corpora with a shared set of topics. The inference method is Collapsed Gibbs sampling [3]. This code can also be used to do "standard" LDA, similar to [3].

The code implements DeltaLDA as a Python C extension module, combining the speed of Python with the flexibility and ease-of-use of raw C ;)

Code

deltaLDA.tgz

Requirements

To build and install the module, you will need: See README.txt for further details.

Topic modeling in Python


from numpy import *
from deltaLDA import deltaLDA

alpha = .1 * ones((1,3))
beta = ones((3,5))
docs = [[1,1,2],
        [1,1,1,1,2],
        [3,3,3,4],
        [3,3,4,4,3,3],
        [0,0,0,0,0],
        [0,0,0,0]]
numsamp = 50
randseed = 1
(phi,theta,sample) = deltaLDA(docs,alpha,beta,numsamp,randseed)

Questions/Comments/Bugs

Open up your Python interpreter and e-mail me at:
'@'.join(['andrzeje','.'.join(['cs','wisc','edu'])])

Reference

[1] Statistical debugging using latent topic models
Andrzejewski, D., Mulhern, A., Liblit, B., and Zhu, X.
In Proceedings of the 18th European Conference on Machine Learning (ECML 2007), 6-17.
(pdf,slides)

[2] Latent Dirichlet Allocation
Blei, D. M., Ng, A. Y., and Jordan, M. I.
Journal of Machine Learning Research (JMLR), 3, Mar. 2003, 993-1022.

[3] Finding Scientific Topics
Griffiths, T., and Steyvers, M.
Proceedings of the National Academy of Sciences (PNAS), 101, 5228-5235.

Other LDA implementations