Note that hands-on use of the HTRC portal and it's tools requires a logon. Please see the information linked from the section titled "The portal", below.
Table of Contents |
---|
What is HathiTrust (HT)?
- A consortium - international partnership of over 100 institutions.
a digital library containing about 13.5 million books, ~5 million (38%) of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to
- Create Collections (public or private)
- Download PDF’s of any item available in full text
a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, and persistent identifiers for all content.
...
The "portal"
- "SHARC" branding in URL - eventually all services will be accessed through the portal.
- functionality depends on login; see the Portal & Workset Builder Tutorial for v3.0, "" for details
Workset builder
- allows researchers to create a set of text to analyze algorithmically, see Portal & Workset Builder Tutorial for v3.0, " for tutorial
- worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)
Algorithms
- allows researchers to run ready to use algorithms against specific collections, see Portal & Workset Builder Tutorial for v3.0, " for tutorial
Bookworm
Extracted Features datasets
Data Capsule
- allows researchers to create a virtual machine environment, confgure with tools, and analyze texts, see Portal & Workset Builder Tutorial for v3.0, " for details
- not yet tied to worksets
- currently restricted to "open-open" (non-restricted) corpus