...
- "SHARC" branding in URL - eventually all services will be accessed through the portal.
- functionality depends on login; see the Portal & Workset Builder Tutorial for v3.0, " " for details
Workset builder
- allows researchers to create a set of text to analyze algorithmically, see Portal & Workset Builder Tutorial for v3.0, " for tutorial
- Linked from the portal, but also at https://sharc.hathitrust.org/blacklight
- really, really helps to use in a second window and operate the portal in the first
- worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)
Algorithms
- allows researchers to run ready to use algorithms against specific collections, see Portal & Workset Builder Tutorial for v3.0, " for tutorial
- Many algorithms provided, others can be added by scholars' request as time permits development
- Workshop dedicated to these alone (ask and I can give you a tour)
- Handout available
Data Capsule
- allows researchers to create a virtual machine environment, configure with tools, and analyze texts, see Portal & Workset Builder Tutorial for v3.0, " for details
- designed to be a secure analytical environment that respects access restrictions to text while allowing for computational analysis
- not yet tied to worksets
- currently restricted to "open-open" (non-restricted) corpus; eventual objective is to allow for access to full HT corpus
Extracted Features datasets
- page level attributes (volume level and page level data) for 4M+ open-open; rationale and features explained
- can download full datasets via rsync (Watch out! BIG!)
- details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
- David Mimno's "word similarity tool" is built from the full Extracted Feature dataset
...