Table of Contents |
---|
Preparation
This page is a companion to the guest lecture on text mining for
Wiki Markup |
---|
{menuicon:elements1} |
Please do bring a laptop! If you have one, bring your own. If you do not have one, please feel free to
Wiki Markup |
---|
{menuicon:elements1} |
No special software will be needed. All exercises will be done through a Web browser, without any special plugins.
Agenda
Presentation
There will be a
Wiki Markup |
---|
{menuicon:elements1} |
Exercises
All exercises will be demonstrated, so no prior knowledge of the tools are required. The room is equipped with a jack to allow easy sharing of your desktop on the screen, so if you discover something that you would like to share and discuss, we can easily do so, and I will encourage that.
Voyant
Wiki Markup |
---|
{menuicon:elements1} |
Wiki Markup |
---|
{menuicon:elements1} |
Wiki Markup {menuicon:elements1}
...
nGrams
...
Voyant privacy policy. Sample texts and URLs for analysis are listed below for experimentation, in case you run low on ideas.
- Sample texts for upload are below, courtesy of Project Gutenberg. Download plain text version to your local machine for upload into the Voyant interface
- Crane, Stephen, 1871-1900. The Red Badge of Courage: An Episode of the American Civil War.
- Montessori, Maria, 1870-1952. (George, Anne E., Translator). The Montessori Method
- Upham, Charles Wentworth, 1802-1875. Salem Witchcraft, Volumes I and II
- Sample URLS: copy and paste into the Voyant upload browser window to get started.
- DSPS Press - http://blogs.cornell.edu/dsps/
- Copyright Law of the United States of America and Related Laws Contained in Title 17 of the United States Code - http://www.copyright.gov/title17/92preface.html
- Niagara Falls (Wikipedia) - http://en.wikipedia.org/wiki/Niagara_Falls
- Sample Visualization
Google nGram Viewer
We will also explore Google's nGram Viewer. Google There will be exercise time using Google's
Wiki Markup |
---|
{menuicon:elements1} |
{menuicon:elements1}
Wiki Markup {menuicon:elements1}
Wiki Markup {menuicon:elements1}
...
ManyEyes
...
Wiki Markup |
---|
{menuicon:elements1} |
Wiki Markup |
---|
{menuicon:elements1} |
...
Wiki Markup |
---|
{menuicon:elements1} |
- nGram tool - delete the words and supply your own.
- Names of New York City
- Spanish words in English publications
- Frequency for which Roland Barthes has been cited in various languages
- Terms related to racial integration
- "Ebola" 1975-2008 in various languages (compare with data from WHO on Ebola outbreaks and a map of official languages of African countries)
Immersion
Immersion is a tool for discovering the connections in a corpus of email. It analyzes the flow data (information found in email headers) and represents these as a network of entities. The analysis is done in real time on the flow data for which you provide credentials. The display is rich and interactive.
By design, Immersion collects only header information (From, To, Cc and Timestamp). However, using the actual flow data from your account may cause concerns regarding privacy - Be sure to read over the FAQs to understand what information you are granting access to, and how it will be used. If you do not like the terms of the tool, you can experience it with their demo data.
...