Delta Latent Dirichlet Allocation

graphical model toy example


This software implements the DeltaLDA model [1] for discrete count data. DeltaLDA is a modification of the Latent Dirichlet Allocation (LDA) model [2] which uses two different topic mixing weight priors to jointly model two corpora with a shared set of topics. The inference method is Collapsed Gibbs sampling [3]. This code can also be used to do "standard" LDA, similar to [3].

The code implements DeltaLDA as a Python C extension module, combining the speed of Python with the flexibility and ease-of-use of raw C ;)




To build and install the module, you will need: See README.txt for further details.

Topic modeling in Python

from numpy import *
from deltaLDA import deltaLDA

alpha = .1 * ones((1,3))
beta = ones((3,5))
docs = [[1,1,2],
numsamp = 50
randseed = 1
(phi,theta,sample) = deltaLDA(docs,alpha,beta,numsamp,randseed)


Open up your Python interpreter and e-mail me at:


[1] Statistical debugging using latent topic models
Andrzejewski, D., Mulhern, A., Liblit, B., and Zhu, X.
In Proceedings of the 18th European Conference on Machine Learning (ECML 2007), 6-17.

[2] Latent Dirichlet Allocation
Blei, D. M., Ng, A. Y., and Jordan, M. I.
Journal of Machine Learning Research (JMLR), 3, Mar. 2003, 993-1022.

[3] Finding Scientific Topics
Griffiths, T., and Steyvers, M.
Proceedings of the National Academy of Sciences (PNAS), 101, 5228-5235.

Other LDA implementations