You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

What is HathiTrust (HT)?

  • A consortium - international partnership of over 100 institutions.
  • a digital library containing about 13.5 million books, ~5 million (38%) of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to

    • Create Collections (public or private)
    • Download PDF’s of any item available in full text
  • a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, and persistent identifiers for all content.

What is the HathiTrust Research Center (HTRC)?

  • a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text. Let's unpack this:
    • "research" - algorithmic analysis, computational analysis
    • "cyberinfrastructure" - Data to Insight center at University of Indiana, supercomputers, data warehouse - SOLR indices
    •  "large amounts" - "at scale", the bigger the better.
    • "cutting edge" - experimental by nature - although taking steps to move into production
  • intended to serve and build community for scholars interested in text anlysis

What specific services does the HTRC offer scholars?

Documentation of offerings on the HTRC User Community Wiki - links to services, user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.

The "portal"

  • "SHARC" branding in URL

Workset builder

 

Algorithms

 

Bookworm

 

Extracted Features datasets

 

Data Capsule

 

 

  • No labels