Preparation

This page is a companion to the Olin Workshop called Text Mining: Hands-On Exploration
Please do bring a laptop! If you have one, bring your own. If you do not have one, please feel free to check one out at the Olin circulation desk. Having a laptop will allow you to participate in the exercises and get the most out of class.
No special software will be needed. All exercises will be done through a Web browser, without any special plugins.

Presentation

There will be a presentation (about 30 minutes) during which your questions and comments are welcome. My aim is to discuss as much as is useful to you. Please feel free to chime in at any time.

Exercises

All exercises will be demonstrated; no prior knowledge of any tools will be required. The room is equipped with video input to allow easy sharing of your desktop on the screen, so if you discover something that you would like to share and discuss, we can easily do so.

Voyant

Voyant is a low barrier text analysis tool that delivers a rich, interactive interface and a variety of visualizations.  Input format can be plain text, a PDF (with OCR), a MS Word Document or a URL for HTML analysis.  Please feel free to bring your own material for upload to the workshop, understanding that upload of any material will be subject to the Voyant privacy policy.  Sample texts and URLs for analysis are listed below for experimentation, in case you run low on ideas.

Google nGram Viewer

We will also explore Google's nGram Viewer. Google nGrams depict the frequency of a word or word phrase by publication year. Note that many modifications can be made to refine the analysis, so please consider the links below as starting points. Syntax for refinement is found on the About page.

Immersion

Immersion is a tool for discovering the connections in a corpus of email.  It analyzes the flow data (information found in email headers) and represents these as a network of entities.  The analysis is done in real time on the flow data for which you provide credentials.  The display is rich and  interactive. 
By design, Immersion collects only header information (From, To, Cc and Timestamp).  However, using the actual flow data from your account may cause concerns regarding privacy - Be sure to read over the FAQs to understand what information you are granting access to, and how it will be used.  If you do not like the terms of the tool, you can experience it with their demo data. 

Viewshare

If time permits, we might look at Viewshare, a free tool provided by the Library of Congress that allows you to generate and customize interactive maps, timelines, facets, and tag cloud visualizations of digital collections.  The tool presupposes that you have an account and your data ready, including location and/or time related data fields in some basic forms.  A few helpful tutorials are available. Viewshare can be embedded into other web experiences. 

 

  • No labels