...
Table of Contents |
---|
What is HathiTrust (HT)?
- A a consortium - international partnership of over 100 institutions.
a digital library containing about 13.5 million books, ~5 million (38%) of which are viewable in full online. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to
- Create create Collections (public or private)
- Download download PDF’s of any item available in full text
a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, and persistent identifiers for all content.
...
- a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text. Let's unpack this:
- "computational access" - computational analysis, algorithmic analysis, distant reading, text-mining
- "cyberinfrastructure" - Data to Insight Center at University of Indiana: supercomputers, data warehouse, SOLR
- "large amounts" - "at scale", the bigger the better (more signal, less noise)
- "cutting edge" - experimental by nature, things can break, things are unfinished/in-development; see the DSPS blog post on HTRC Uncamp 2015 for most recent developments
- intended to serve and build community for scholars interested in text analysis; join user group mailing list (send an email to htrc-usergroup-l-subscribe@list.indiana.edu)
...
- allows researchers to create a set of text to analyze algorithmically, see Portal & Workset Builder Tutorial for v3.0, "C " for tutorial
- Linked linked from the portal, but also at https://sharc.hathitrust.org/blacklight
- really, really helps to use in a second window and operate the portal in the first
- worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)
...
- allows researchers to run ready to use algorithms against specific collections, see Portal & Workset Builder Tutorial for v3.0, " " for tutorial
- Many many algorithms provided, others can be added by scholars' request as time permits development
- Workshop workshop dedicated to these alone (ask and I can give you a tour)
- Handout handout available
Data Capsule
- allows researchers to create a virtual machine environment, configure with tools, and analyze texts, see Portal & Workset Builder Tutorial for v3.0, "U " for details
- Requires requires a VNC application for your browser, like VNC View for Google Chrome
- designed to be a secure analytical environment that respects access restrictions to text while allowing for computational analysis; maintenance mode / secure mode
- not yet tied to worksets
- currently restricted to "open-open" (non-restricted) corpus; eventual objective is to allow for access to full HT corpus
...
- open source project, same basic functionality as Google nGram Viewer, although graphically faceted
- base data is currently "open-open" data; working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.
- plans and allocated grant to develop tie-in to worksets
- See see wiki for tutorial