Latent Dirichlet Allocation via Collapsed Variational Inference

Multivariate Polya


This software implements Collapsed Variational Bayesian (CVB) inference [1] for the LDA model [2] of discrete count data. The code is based on the Teh et al paper [1], and also uses some practical implementation details kindly provided by the authors on the extremely helpful topic-models mailing list.

The code implements the collapsed variational inference for LDA as a Python C extension module.


Importantly, I wrote this code as an educational exercise for my own benefit only. This software has no connection with [1] or its authors whatsoever. Any errors contained herein are my own.


cvbLDA on github


To build and install the module, you will need: See README.txt for further details.

Example usage

from numpy import *
from cvbLDA import cvbLDA

# Model params
(T,W) = (3,5)
alpha = .1 * ones((1,T))
beta = .1 * ones((T,W))

# Dataset
docs_w = [[1,2],[1,2],[3,4],
docs_c = [[2,1],[4,1],[3,1],

# Stopping conditions for inference
(maxiter,convtol) = (10,.01)

# Do CVB inference for LDA
(phi,theta,gamma) = cvbLDA(docs_w,docs_c,alpha,beta,


Open up your Python interpreter and e-mail me at:


[1] A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation
Teh Y.W., Newman D., and Welling, M.
Advances in Neural Information Processing Systems (NIPS) 19, 2007.

[2] Latent Dirichlet Allocation
Blei, D. M., Ng, A. Y., and Jordan, M. I.
Journal of Machine Learning Research (JMLR), 3, Mar. 2003, 993-1022.