This page is a companion to a guest lecture prepared for ARCH 3819/ARCH5819

Class Preparation

Please bring a laptop. If you do not own one, feel free to check one out at the Olin circulation desk. Our explorations are intended to be participatory.
No special software will be needed. All explorations will be done through a Web browser.

Discussion of terms

There will be a short discussion where we define terms (at least loosely) so that we might use a common language for our discussion and critique of these tools and their strategies.

Explorations

All analysis tools will be demonstrated; no prior knowledge of any tools will be required. The room is equipped with video input to allow sharing of your desktop on the screen.  If you discover something that you would like to share and discuss, we can easily do so.

Word Frequencies

Voyant is a low barrier text analysis tool that delivers an interactive interface and a variety of visualizations.

  • Data: Text of your choosing.  Upload interface accepts files in the following formats: plain text, PDF that has embedded OCR, MS Word doc and docx files, URLs. Can take multiple documents to make a set.  Upload of any material will be subject to the Voyant privacy policy.
  • Analysis: Voyant environment lists every word in the document and counts for each.  Also calculates frequencies based on the total word word count of the document(s) in the uploaded set. 
  • Visualization: Interactive dashboard: word cloud, graph of frequency over document segments, tabled secondary data, source data as text.  Also has navigational aids that integrate source an secondary data with each other and with visualizations, and offers utilities for stop word control, URLs, screen capture, secondary data download, etc. 

Sample texts and URLs for analysis are listed below for experimentation, but feel free to use other source data that interests you. 

nGrams

nGrams depict the frequency of a word or word phrase and are most often depicted over publication year.  We have two nGram tools, each leveraging different source data.

Examples from Quantitative Analysis of Culture Using Millions of Digitized Books. Jean-Baptiste Michel, et. al  Science: 14 Jan 2011:Vol. 331, Issue 6014, pp. 176-182; DOI: 10.1126/science.1199644

Google's nGram Viewer. The links below as starting points.  Dynamic modifications can be made at any point. Rules for syntax can be found on the About page.

Network Analysis

Example: Linked Jazz is an interactive visualization of various types of connections between notable Jazz artists. 

Immersion is a tool for analyzing and depicting connections in email. By design, Immersion collects only header information (From, To, Cc and Timestamp).  The FAQ describes what information you grant access to, how it will be used, and how to delete your data when you are done.  You can also explore with demo data.

 

Gephi is a commonly used network analysis tool that is much more flexible and powerful.  It requires local installation.

 

Spatial and Temporal Representation

Viewshare is a free tool provided by the Library of Congress that generates interactive maps and timelines with facets for digital collections. The tool presupposes the setup of a user account and data in columnar form that includes location and/or time related data fields.  A few helpful tutorials are available.

  • Data: User supplied.  The example below is from selected metadata from the Cornell HipHop Collection. It included Photographs from Joe Conzo of the early HipHop Music scene and flyers of HipHop venues. 
  • Analysis: The photos and flyers were cataloged in Shared Shelf with spatial and temporal information of the subjects.  Resulting metadata was downloaded in MS Excel format. 
  • Visualization: Spreadsheet was uploaded into Viewshare, ans minimally set up.  Map and timeline tie to uploaded secondary data. 

Test visualization of digital collections from Cornell University Hip Hop Collection created with Viewshare


Image Analysis

Ukiyo-e.org is a database and image similarity analysis engine, created by John Resig to aide researchers in the study of Japanese woodblock prints.