Digital technologies have made vast amounts of text available to researchers, and this same technological moment has provided us with the capacity to analyze that text. The first step in that analysis is to transform texts designed for human consumption into a form a computer can analyze. Using Python and the Natural Language ToolKit (commonly called NLTK), this workshop introduces strategies to turn qualitative texts into quantitative objects. Through that process, we will present a variety of strategies for simple analysis of text-based data.
By the end of this workshop, you will be able to:
Text as Data
Cleaning and Normalizing
NLTK Methods with the NLTK Corpus
Searching For Words
Built-In Python Functions
Making Your Own Corpus: Data Cleaning
Make Your Own Corpus
Digital Research Institute (DRI) Curriculum by Graduate Center Digital Initiatives is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/DHRI-Curriculum. When sharing this material or derivative works, preserve this paragraph, changing only the title of the derivative work, or provide comparable attribution.
Total run time: less than 5 seconds
Total cpu time used: less than 5 seconds
Total disk space used: 2.24 MB