Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • "SHARC" branding in URL - eventually all services will be accessed through the portal.
  • functionality depends on login; see the Portal & Workset Builder Tutorial for v3.0, "Sign up for an account, and sign in" for details

Workset builder

  • allows researchers to create a set of text to analyze algorithmically, see  Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial
  • Linked from the portal, but also at https://sharc.hathitrust.org/blacklight
  • really, really helps to use in a second window and operate the portal in the first
  • worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)

Algorithms

  • allows researchers to run ready to use algorithms against specific collections, see  Portal & Workset Builder Tutorial for v3.0, "Create Workset" for tutorial
  • Many algorithms provided, others can be added by scholars' request as time permits development
  • Workshop dedicated to these alone (ask and I can give you a tour)
  • Handout available

Data Capsule

  • allows researchers to create a virtual machine environment, configure with tools, and analyze texts, see  Portal & Workset Builder Tutorial for v3.0, "Use the HTRC Data Capsule" for details
  • designed to be a secure analytical environment that respects access restrictions to text while allowing for computational analysis
  • not yet tied to worksets
  • currently restricted to "open-open" (non-restricted) corpus; eventual objective is to allow for access to full HT corpus

Extracted Features datasets

  • page level attributes (volume level and page level data) for 4M+ open-open; rationale and features explained 
  • can download full datasets via rsync (Watch out! BIG!)
  • details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
  • David Mimno's "word similarity tool" is built from the full Extracted Feature dataset

...