Page History

...

a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text. Let's unpack this:
- "research" - algorithmic analysis, computational analysis
- "cyberinfrastructure" - Data to Insight center at University of Indiana, supercomputers, data warehouse - SOLR indices
- "large amounts" - "at scale", the bigger the better.
- "cutting edge" - experimental by nature - although taking steps to move into production, things can break, things are unfinished/in-development
intended to serve and build community for scholars interested in text anlysisanalysis; join usergroup mailing list (send an email to htrc-usergroup-l-subscribe@list.indiana.edu)

What specific services does the HTRC offer scholars?

...

allows researchers to create a set of text to analyze algorithmically, see Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial
Linked from the portal, but also at https://sharc.hathitrust.org/blacklight
really, really helps to use in a second window and operate the portal in the first
worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)

...

allows researchers to run ready to use algorithms against specific collections, see Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial

...

Many algorithms provided, others can be added by scholars' request as time permits development
Workshop dedicated to these alone (ask and I can give you a tour)
Handout available

allows researchers to create a virtual machine environment, confgure configure with tools, and analyze texts, see Portal & Workset Builder Tutorial for v3.0, "Use the HTRC Data Capsule" for details
not yet tied to worksets
currently restricted to "open-open" (non-restricted) corpus

same functionality as Google nGrams
base data is currently "open-open" data (liberated from Google stipulations); working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.
plans and allocated grant to develop tie-in to worksets
See wiki for tutorial