Preparation

This page is a companion to the Olin Workshop on 3/19 called Text Mining: Bootstrap Yourself!
Please do bring a laptop! If you have one, bring your own. If you do not have one, please feel free to check one out at the Olin circulation desk. Having a laptop will allow you to participate in the exercises and get the most out of class.
No special software will be needed. All exercises will be done through a Web browser, without any special plugins.

Agenda

Presentation

There will be a presentation (about 30 minutes) during which your questions and comments are welcome. My aim is to discuss as much as is useful to you. Please feel free to chime in at any time.

Exercises
Voyant

There will be exercise time using Voyant, a low barrier text analysis tool that delivers a rich, interactive interface. All exercises will be demonstrated, so no prior knowledge of the tool is required.
The room is equipped with a jack to allow easy sharing of your desktop on the screen, so if you discover something that you would like to share and discuss, we can easily do so, and I will encourage that. Please feel free to bring your own material for upload and analysis to the workshop, understanding that upload of any material will be subject to the Voyant privacy policy.
Sample texts and URLs for analysis are listed below for experimentation, in case you run low on ideas.

nGrams

If time, there will be exercise time using Google's nGram tool. nGrams depict the frequency of a word or word phrase by publication year. Note that many modifications can be made to refine the analysis, so please consider the links below as starting points. Syntax for refinement is found on the About page.

Manyeyes

Manyeyes is a site run by IBM that allows for various visualizations of data. A few of the visualizations allow for free text entry, and analyze the text directly. Please feel free to upload your own work into the site as primary data (upload requires an account), understanding that the IBM Online Privacy statement will apply.  Three visualizations of the same data are offered below.

  • No labels