This page is a companion to a guest lecture prepared for ARCH 3819/ARCH5819.
Table of Contents |
---|
Class Preparation
Please bring a laptop. If you do not own one, feel free to check one out at the Olin circulation desk. Our explorations are intended to be participatory.
No special software will be needed. All explorations will be done through a Web browser.
Discussion of terms
There will be a short discussion where we define terms (at least loosely) so that we might use a common language for our discussion and critique of these tools and their strategies.
Explorations
All analysis tools will be demonstrated; no prior knowledge of any tools will be required. The room is equipped with video input to allow sharing of your desktop on the screen. If you discover something that you would like to share and discuss, we can easily do so.
Word Frequencies
Expand | ||
---|---|---|
| ||
Voyant is a low barrier text analysis tool that delivers an interactive interface and a variety of visualizations.
Sample texts and URLs for analysis are listed below for experimentation, but feel free to use other source data that interests you.
|
nGrams
nGrams depict the frequency of a word or word phrase and are most often depicted over publication year. We have two nGram tools, each leveraging different source data.
...
- Data: Primary source data is the openly viewable texts from the HathiTrust Research Center (publications mostly pre-1923).
- Analysis: HTRC algorithms tokenize text and then count the frequencies of those tokens. Secondary data includes tokens, counts, frequencies, publication dates, etc. arranged in SOLR indexes.
- Visualization: Simple line graph plots of ngram frequency over publication date of the secondary data. Tool has faceting control as well that leverage enriched bibliographic metadata of the texts comprising the primary data.
- Examples
- Spanish Words in English publications
- Various words for "creole"
- Mystery (contrast fiction and nonfiction)
Network Analysis
Example: Linked Jazz is an interactive visualization of various types of connections between notable Jazz artists.
...
Gephi is a commonly used network analysis tool that is much more flexible and powerful. It requires local installation.
Expand | ||
---|---|---|
| ||
Spatial and Temporal RepresentationViewshare is a free tool provided by the Library of Congress that generates interactive maps and timelines with facets for digital collections. The tool presupposes the setup of a user account and data in columnar form that includes location and/or time related data fields. A few helpful tutorials are available.
Test visualization of digital collections from Cornell University Hip Hop Collection created with Viewshare |
Image Analysis
Ukiyo-e.org is a database and image similarity analysis engine, created by John Resig to aide researchers in the study of Japanese woodblock prints.
- Data: Over 213,000 digital copies of prints from 24 institutions, and their cataloging metadata. Metadata is indexed and searchable. Details are noted in the about page.
- Analysis: Image search uses the TinEye matching engine to determine edges in an uploaded sample and compares with analyzed edges in database, returning probable matches.
- Visualization: Tiled images of "hits" for easy comparison, with URL links to their metadata in the source institution's catalog.
...