Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note that hands-on use of the HTRC portal and it's tools requires a logon.  Please see the information linked from the section titled "The portal", below.

Table of Contents

What is HathiTrust (HT)?

  • A consortium - international partnership of over 100 institutions.
  • a digital library containing about 13.5 million books, ~5 million (38%) of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to

    • Create Collections (public or private)
    • Download PDF’s of any item available in full text
  • a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, and persistent identifiers for all content.

...

The "portal"

  • "SHARC" branding in URL - eventually all services will be accessed through the portal.
  • functionality depends on login; see the Portal & Workset Builder Tutorial for v3.0, "Sign up for an account, and sign in" for details

Workset builder

  • allows researchers to create a set of text to analyze algorithmically, see  Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial
  • worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)

Algorithms

Bookworm

 

Extracted Features datasets

 

Data Capsule

  • allows researchers to create a virtual machine environment, confgure with tools, and analyze texts, see  Portal & Workset Builder Tutorial for v3.0, "Use the HTRC Data Capsule" for details
  • not yet tied to worksets
  • currently restricted to "open-open" (non-restricted) corpus