Page History

...

Documentation of offerings on the HTRC User Community Wiki - links to services, user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.

The "portal po rtal"

"SHARC" branding in URL (may sometimes be noted: Secure HathiTrust Analytical Research Commons)
access to tools depends on login; see the Portal & Workset Builder Tutorial for v3.0the HTRC Analytics step-by-step tutorial, "Sign up for an account, and sign in" for details

Workset

...

management

allows researchers to create a set of text to analyze algorithmically, see Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial
linked from the portal, but also at https://sharc.hathitrust.org/blacklight
the tutorial.
you can create a workset from a file - specification for file are given
It is a good idea to validate your workset before loading - the validator will let you know if there are issues with your filereally, really helps to use in a second window and operate the portal in the first
worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)

Algorithms

allows allow researchers to run ready to use algorithms against specific collections, see Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorialthe tutorial.
many algorithms provided , (see the full list and descriptions of each) others can be added by scholars' request as time permits development
workshop dedicated to these alone (ask and I can give you a tour)
handout available

...

allows researchers to create a virtual machine environment, configure with tools, and analyze texts, see Portal & Workset Builder Tutorial for v3.0, "Use the HTRC Data Capsule" for details Documentation available.
requires a VNC application for your browser, like VNC View for Google Chrome
designed to be a secure analytical environment that respects access restrictions to text while allowing for computational analysis; maintenance mode / secure mode
not yet tied to worksets, but there is a workaround
currently restricted to "open-open" (non-restricted) corpus; eventual objective is to allow for access to full HT corpus

Datasets

All data sets including backversions - non-beta offerings
Extracted Features

...

Data Set
- A brief single page attachment describing the motivations and potential of the data set.
- page level attributes (volume level and page level data) for all books in HT; rationale and features
explained
- explained
- can download full dataset via rsync (Watch out! BIG! 4TB!)
- details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
- David Mimno's "word similarity tool" is built from the full Extracted Feature
dataset
- data set

Bookworm

open source project, same basic functionality as Google nGram Viewer, although graphically faceted
base data is currently "open-open" data; working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.linked to a back version of the EF data set that includes 13.5 M volumes both full view and in copyright
plans and allocated grant to develop tie-in to worksets
see wiki for tutorial

...

Page tree

Versions Compared

Old Version 39

New Version Current

Key

The "portal po rtal"

Workset

management

Algorithms

Datasets

Bookworm

Page tree

Page History

Versions Compared

Old Version 39

New Version Current

Key

The "portalportal"

Workset

management

Algorithms

Datasets

Bookworm

The "portal po rtal"