HathiTrust Research Center - Introduction

What is HathiTrust (HT)?

A consortium - international partnership of over 100 institutions.
a digital library containing about 13.5 million books, ~5 million (38%) of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to
- Create Collections (public or private)
- Download PDF’s of any item available in full text
a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, and persistent identifiers for all content.

What is the HathiTrust Research Center (HTRC)?

a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text. Let's unpack this:
- "research" - algorithmic analysis, computational analysis
- "cyberinfrastructure" - Data to Insight center at University of Indiana, supercomputers, data warehouse - SOLR indices
- "large amounts" - "at scale", the bigger the better.
- "cutting edge" - experimental by nature - although taking steps to move into production
intended to serve and build community for scholars interested in text anlysis

What specific services does the HTRC offer scholars?

Documentation of offerings on the HTRC User Community Wiki - links to services, user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.

Page tree

What is HathiTrust (HT)?

What is the HathiTrust Research Center (HTRC)?

What specific services does the HTRC offer scholars?

The "portal"

Workset builder

Algorithms

Bookworm

Extracted Features datasets

Data Capsule