This page is a companion to a guest lecture on text analysis for ASRC 4513 - Science Fiction and the Value of Utopia/Dystopia (instructor: Ricardo Wilson) 3/21/2017 in 701 Olin Library.
Topic modeling - framing
What is topic modeling?
Probabilistic model
"Unsupervised" - see tail end of https://tedunderwood.com/2015/06/04/seven-ways-humanists-are-using-computers-to-understand-text/
Topic modeling - optional - resources for diving deeper
How does it work?
- Processing - LDA - https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
Derive a set of secondary data from the text - typically something like the images from Probabilistic Topic Models by Steyvers and Griffiths (2007), see page 2
Sample Visualizations
- word clouds (sample topics in a word cloud from Shakespeare's Comedies)
- termite - http://vis.stanford.edu/papers/termite
- TAPoR - http://tapor.ca/tools?page=1&query=topic%20model
- Mimno’s wordsim - https://mimno.infosci.cornell.edu/wordsim/nearest.html?q=monarchy
Implementations of LDA
- MALLET - Java - http://mallet.cs.umass.edu/index.php
- Stanford’s Topic modeling Toolbox (obsolete) - Scala http://nlp.stanford.edu/software/tmt/tmt-0.4/
- topicmodels - R package - https://CRAN.R-project.org/package=topicmodels
- Python - genism package - https://radimrehurek.com/gensim/