Topic Modeling Papers and Tools

Papers

Implementations

Note: The table format and content of the first two rows are borrowed from David Blei's Topic Modeling page

Link Model/Algorithm Language Author Notes
lda-c Latent Dirichlet allocation C D. Blei This implements variational inference for LDA.
class-slda Supervised topic models for classifiation C++ C. Wang Implements supervised topic models with a categorical response.
GibbsLDA++ A C/C++ Implementation of Latent Dirichlet Allocation C/C++ Xuan-Hieu Phan and Cam-Tu Nguyen Uses Gibbs Sampling technique for parameter estimation and inference. It is very fast and is designed to analyze hidden/latent topic structures of large-scale datasets including large collections of text/Web documents.
MALLET A Machine Learning for Language Toolkit Java Andrew Kachites McCallum Implements Gibbs sampling for LDA in Java using fast sampling methods. MALLET also includes support for data preprocessing, classification, and sequence tagging.
Gensim A Python package for topic modelling. Python Radim Řehůřek Includes distributed and online implementation of variational LDA.
Multithreaded LDA Multithreaded extension of Blei's LDA implementation. C Ramesh Nallapati Speeds up the computation by orders of magnitude depending on the number of processors.
Stanford Topic Modeling Toolbox Scala implementation of LDA and Labeled LDA. Scala Daniel Ramage and Evan Rosen Import and manipulate text from cells in Excel and other spreadsheets. Generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.

Corpora