This page is a companion to the Workshop for select graduate students on the HathiTrust Research Center (HTRC) Portal 4/8/2014.
Please do bring a laptop! If you have one, bring your own. If you do not have one, please feel free to check one out at the Olin circulation desk. Having a laptop will allow you to participate in the exercises and get the most out of class.
You will need to obtain credentials to log in. Please follow the steps on the HTRC wiki (see "How do I obtain an account?") for getting started with the production portal.
The class is a guided exploration in the HTRC portal. We will explore the algorithms and use them to discover their capabilities, limitations, and various strategies for addressing those challenges. This allows us to explore the HTRC portal in specific, and grapple firsthand with basic issues encountered in computational analysis of text.
Log on to the HTRC Production Portal with your personal credentials.
Worksets can be managed through the Production Portal's Blacklight instance. You can also access this by using the "Create Workset" link. Regardless of the avenue of entrance, you will have to log in a second time with the same credentials you are using in the Production Portal.
Worksets can be created through Uploading Worksets of a CSV file containing Volume ID.
Use this page when referencing custom stop word lists. I can post lists relevant to your collections if you have them.
Open with a text editor and change this line
<script src="https://htrc2.pti.indiana.edu/HTRC-UI-Portal2/js/timeline-api.js" type="text/javascript"></script> |
to this line
<script src="http://api.simile-widgets.org/timeline/2.3.1/timeline-api.js" type="text/javascript"></script> |
HathiTrust is:
an international partnership of over 100 institutions.
a digital library containing over 13 million books, 38% of which are in the public domain. All items are fully indexed, allowing for full text search within all volumes. You can login with your Cornell NetID to
a trustworthy preservation repository providing long-term stewardship, redundant robust backup, continuous monitoring, persistent identifiers for all content
HathiTrust Research Center (HTRC) - a collaborative research center (jointly managed by Indiana University and the University of Illinois) dedicated to developing cutting-edge software tools and cyberinfrastructure that enable advanced computational access to large amounts of digital text.
HTRC Production Portal - a web-based user experience of the HTRC. The production portal makes available all the full-text indexes of the Google-digitized deposits to HathiTrust that are in the public domain.
HTRC User Community Wiki - home of the user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.