TAToM: Text Analysis with Topic Models for the Humanities and Social Sciences

A tutorial by Allen Riddell

The training materials "TAToM - Text Analysis with Topic Models for the Humanities and Social Sciences" consist of a series of tutorials that cover basic methods of quantitative text analysis.

The tutorials cover the preparation of a textkorpus for the analysis and exploration of text collections using methods such as topic modeling and machine learning. The tutorials deal with both basic and advanced topics. They primarily use the Python programming language to deal with the text data.

The contents in the overview:

  • Preliminaries & Getting started
  • Working with text
  • Preprocessing
  • Feature selection: finding distinctive words
  • Topic modeling with MALLET
  • Topic modeling in Python
  • Visualizing topic models
  • Classification, Machine Learning, and Logistic Regression
  • Case Study: Racine's early and late tragedies

The tutorials were written by Allen Riddell for DARIAH-DE and released in March 2014 in version 1.0. The coordination was with Christof Schöch at the chair of computer philology at the University of Würzburg.

Feedback to the tutorials is always welcome, as well as hints for errors. Please use the issue tracker on GitHub.

 

Creative Commons Lizenzvertrag

This tutorial is licensed by a Creative Commons Namensnennung 4.0 International Lizenz.