Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text. Let's unpack this:
    • "research" - algorithmic analysis, computational analysis
    • "cyberinfrastructure" - Data to Insight center at University of Indiana, supercomputers, data warehouse - SOLR indices
    •  "large amounts" - "at scale", the bigger the better.
    • "cutting edge" - experimental by nature - although taking steps to move into production, things can break, things are unfinished/in-development
  • intended to serve and build community for scholars interested in text anlysisanalysis; join usergroup mailing list (send an email to htrc-usergroup-l-subscribe@list.indiana.edu)

What specific services does the HTRC offer scholars?

...

  • allows researchers to create a set of text to analyze algorithmically, see  Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial
  • Linked from the portal, but also at https://sharc.hathitrust.org/blacklight
  • really, really helps to use in a second window and operate the portal in the first
  • worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)

...

Bookworm

Extracted Features datasets

...

  • Many algorithms provided, others can be added by scholars' request as time permits development
  • Workshop dedicated to these alone (ask and I can give you a tour)
  • Handout available

Data Capsule

  • allows researchers to create a virtual machine environment, confgure configure with tools, and analyze texts, see  Portal & Workset Builder Tutorial for v3.0, "Use the HTRC Data Capsule" for details
  • not yet tied to worksets
  • currently restricted to "open-open" (non-restricted) corpus

Bookworm

  • same functionality as Google nGrams
  • base data is currently "open-open" data (liberated from Google stipulations); working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.
  • plans and allocated grant to develop tie-in to worksets
  • See wiki for tutorial

Extracted Features datasets