Table of Contents |
---|
Preparation
This page is a companion to the Olin Workshop on 3/19 called Text Mining: Bootstrap Yourself!
Please do bring a laptop! If you have one, bring your own. If you do not have one, please feel free to check one out at the Olin circulation desk. Having a laptop will allow you to participate in the exercises and get the most out of class.
No special software will be needed. All exercises will be done through a Web browser, without any special plugins.
Agenda
Presentation
There will be a presentation (about 30 minutes) during which your questions and comments are welcome. My aim is to discuss as much as is useful to you. Please feel free to chime in at any time.
Exercises
Voyant
There will be exercise time using Voyant, a low barrier text analysis tool that delivers a rich, interactive interface. All exercises will be demonstrated, so no prior knowledge of the tool is required.
The room is equipped with a jack to allow easy sharing of your desktop on the screen, so if you discover something that you would like to share and discuss, we can easily do so, and I will encourage that. Please feel free to bring your own material for upload and analysis to the workshop, understanding that upload of any material will be subject to the Voyant privacy policy.
Sample texts and URLS URLs for analysis are listed below for experimentation, in case you run low on ideas.
- Sample texts for upload courtesy of project Gutenberg. Download plain textversion to your local machine
- Crane, Stephen, 1871-1900. The Red Badge of Courage: An Episode of the American Civil War.
- Montessori, Maria, 1870-1952.
...
- (George, Anne E., Translator). The Montessori Method
- Upham, Charles Wentworth, 1802-1875. Salem Witchcraft, Volumes I and II
- Sample URLS: copy and paste into the Voyant upload browser window to get started.
- DSPS Press - http://blogs.cornell.edu/dsps/
- Copyright Law of the United States of America and Related Laws Contained in Title 17 of the United States Code - http://www.copyright.gov/title17/92preface.html
- Niagara Falls (Wikipedia) - http://en.wikipedia.org/wiki/Niagara_Falls
nGrams
...
If time, there will be exercise time using Google's nGram tool. nGrams depict the frequency of a word or word phrase by publication year. Note that many modifications can be made to refine the analysis, so please consider the links below as starting points. Syntax for refinement is found on the About page.
- nGram tool - delete the words and supply your own.
- Names of New York City
- Spanish words in English publications
Manyeyes
Manyeyes is a site run by IBM that allows for various visualizations of data. A few of the visualizations allow for free text entry, and analyze the text directly. Please feel free to upload your own work into the site as primary data (upload requires an account), understanding that the IBM Online Privacy statement will apply. Three visualizations of the same data are offered below.
- Martin Luther King, Jr's "I Have A Dream..." Phrase Net
- Martin Luther King, Jr's "I Have A Dream..." Word Tree
- Martin Luther King, Jr's "I Have A Dream..." Word Cloud Generator