This page is a companion to a guest lecture on text analysis for ASRC 4513 - Science Fiction and the Value of Utopia/Dystopia (instructor: Ricardo Wilson) 3/16/2017 in 701 Olin Library.
Preparation
- Please bring a laptop! Bring your own or feel free to check one out at the Olin circulation desk. No special software will be needed. All exercises will be done through a Web browser, without any special plugins.
- Read through the first three sections of Text-mining in Wikipedia. This will give us a common orientation to text analytics.
- Read Ted Underwood's blogpost "Seven ways humanists are using computers to understand text". Make note of items that interest you, or you want to learn more about.
- Optional - Bring samples of text to load into Voyant - Input format can be plain text, a PDF (with OCR), a MS Word Document.
Discussion from the reading (fairly open)
- What thoughts seemed most interesting?
- For what thoughts might you need clarification?
- What thoughts merit further exploration?
- In terms of the continua mentioned in the article. Where do you see your interests in terms of possible areas for exploration? Some areas of possible tension are noted below:
- automate familiar tasks vs. making new discoveries
- individual text vs. large corpora
- modeling to explain vs. modeling to predict
- supervised vs. unsupervised learning
Exercises
All exercises will be demonstrated; we will learn together through "making". Let's approach this informally.
Google nGrams
Google's nGram Viewer. Google nGrams depict the frequency of a word or word phrase by publication year. Note that many modifications can be made to refine the analysis, so please consider the links below as starting points. Syntax for refinement is found on the About page.
Building a progressive example (American English corpus, 1980-2008). This example assumes that when writing about a movement, authors also give examples of that movement. We can track how frequency of specific examples change over time.
- Afrofuturism
- Afrofuturism, Sun Ra - adding term
- Afrofuturism,(Sun Ra * .01) - aligning the curves, absolute comparison of frequency no longer applies, trending is easier to compare
- Afrofuturism,(Sun Ra * .01),Herman Blount - adding term
- Afrofuturism,(Sun Ra * .01),Herman Blount,Le Sony'r Ra - adding term
- Afrofuturism,((Sun Ra+Herman Blount+Le Sony'r Ra)*.01) - coalescing terms
- Afrofuturism,((Sun Ra+Herman Blount+Le Sony'r Ra)*.01),Parliament Funkadelic - adding term (note the expansion of years included)
- Your turn - what terms can you think of to explore?
Questions for thought -
- What kind of thought does each stage elicit?
- Are there conclusions we can make at any stage?
- Are there provocative thoughts that we can further test by refining the model?
- What are the boundaries of feasible or logical hypothesis?
- Is this a tool for exploring a corpus, proving a theory, or both?
Voyant
Voyant is a low barrier text analysis tool that delivers a rich, interactive interface and a variety of visualizations. (A guide is available with extensive documentation.) Input format can be plain text, a PDF (with OCR), a MS Word Document or a URL for HTML analysis. Please feel free to bring your own material for upload, understanding that upload of any material will be subject to the Voyant privacy policy. Sample texts and URLs for analysis are listed below for experimentation, in case you run low on ideas.
Sample texts for uploading from Project Gutenberg (choose the Plain Text UTF-8 version for download)
The Souls of Black Folk by W. E. B. Du Bois - http://www.gutenberg.org/ebooks/408 (and uploaded into Voyant)
The Quest of the Silver Fleece: A Novel by W. E. B. Du Bois - http://www.gutenberg.org/ebooks/15265
The House Behind the Cedars by Charles W. Chesnutt - http://www.gutenberg.org/ebooks/472
- Your turn - upload and explore texts of your choice.
Questions for thought -
- What types of tools or visualizations in this toolset do you find most helpful in providing insight into a text?
- What types of applicability does this toolset have?
- Are there provocative thoughts that we can further test by adjusting the analysis settings?
- Are there aspects of this toolset that seem misleading? Can these be mitigated/corrected?
Immersion
Questions for thought -
- Are there relationships depicted in this visualization that confirm your intuition?
- Are there relationships that seem surprising?
- Are there aspects of the tool and it's visualization that seem misleading?
Wrap up
- Returning to the blog post, have new questions come up for you?
- Has anything changed about with regard to your interests in terms of possible areas for exploration?
Other resources
- Duke University's LibGuide "Introduction to Text Analysis" - http://guides.library.duke.edu/c.php?g=289707&p=1930854
- Text Analysis Portal for Research (TAPoR) - http://tapor.ca/home