...
Documentation of offerings on the HTRC User Community Wiki - links to services, user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.
The "portalportal"
- "SHARC" branding in URL (may sometimes be noted: Secure HathiTrust Analytical Research Commons)
- access to tools depends on login; see the Portal & Workset Builder Tutorial for v3.0the HTRC Analytics step-by-step tutorial, " " for details
Workset
...
management
- allows researchers to create a set of text to analyze algorithmically, see Portal & Workset Builder Tutorial for v3.0, "C" for tutorial
- linked from the portal, but also at https://sharc.hathitrust.org/blacklight
- the tutorial.
- you can create a workset from a file - specification for file are given
- It is a good idea to validate your workset before loading - the validator will let you know if there are issues with your filereally, really helps to use in a second window and operate the portal in the first
- worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)
Algorithms
- allows allow researchers to run ready to use algorithms against specific collections, see Portal & Workset Builder Tutorial for v3.0, "" for tutorialthe tutorial.
- many algorithms provided , (see the full list and descriptions of each) others can be added by scholars' request as time permits development
- workshop dedicated to these alone (ask and I can give you a tour)
- handout available
...
- allows researchers to create a virtual machine environment, configure with tools, and analyze texts, see Portal & Workset Builder Tutorial for v3.0, "U" for details Documentation available.
- requires a VNC application for your browser, like VNC View for Google Chrome
- designed to be a secure analytical environment that respects access restrictions to text while allowing for computational analysis; maintenance mode / secure mode
- not yet tied to worksets, but there is a workaround
- currently restricted to "open-open" (non-restricted) corpus; eventual objective is to allow for access to full HT corpus
Datasets
- All data sets including backversions - non-beta offerings
- Extracted Features
...
- Data Set
- A brief single page attachment describing the motivations and potential of the data set.
- page level attributes (volume level and page level data) for all books in HT; rationale and features
- explained
- can download full dataset via rsync (Watch out! BIG! 4TB!)
- details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
- David Mimno's "word similarity tool" is built from the full Extracted Feature
- data set
Bookworm
- open source project, same basic functionality as Google nGram Viewer, although graphically faceted
- base data is currently "open-open" data; working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.linked to a back version of the EF data set that includes 13.5 M volumes both full view and in copyright
- plans and allocated grant to develop tie-in to worksets
- see wiki for tutorial
...