Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • a consortium - international partnership of over 100 130 institutions.
  • a digital library containing about 13.5 17 million books, ~5 ~6 million (38%37%) of which are viewable in full online. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to

    • create Collections collections (public or private)
    • download PDF’s of any item available in full text
  • a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, and persistent identifiers for all content.

...

Computational analysis must address the very real challenges of what can and cannot be legally shared digitally, so it helps to understand the realities that affect full-text viewability.  Not all books in HathiTrust are viewable in full, although all are indexed in full.  Viewability is determined by many factors, including copyright law (both US and International) and stipulations of the rights-holders (authors and/or publishers) and digitizing agents (like Google).  There are two attributes assigned that affect viewability.  The first is an attribute that describes a complex set of factors relating to copyright, digitizing agents and rights-holders, referred to as "rights" metadata.  The second attribute is a binary value ("allow/deny") often referred to as "access" metadata.  In cases where a volume has no factors attached to it that would limit sharing, both attributes would express this.  Colloquially, the set of these volumes are referred to as the "open-open"  set.  HTRC development is commonly done in What a researcher can do with text is governed by these factors, and the most unrestricted uses can be made from the open-open set. 

What is the HathiTrust Research Center (HTRC)?

  • a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text. Let's unpack this:
    • "computational access" - computational analysis, algorithmic analysis, distant reading, text-mining
    • "cyberinfrastructure" - for the most part, the Data to Insight Center at University of Indiana: supercomputers, data warehouse, SOLR indexing
    • "large amounts" - "at scale", the bigger the better (more better signal, less noise)
    • "cutting edge" - experimental by nature, things can break, things are unfinished/in-development; see the DSPS blog post on HTRC Uncamp 2015  for most recent developments
  • intended to serve and build community for scholars interested in text analysis; join user group mailing list (send an email to htrc-usergroup-l-subscribe@list.indiana.edu)

...