Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Documentation of offerings on the HTRC User Community Wiki - links to services, user support documentation, meeting notes, elist addresses and sign-up information, and FAQs.

The "portalportal"

Workset

...

management

  • allows researchers to create a set of text to analyze algorithmically, see  Portal & Workset Builder Tutorial for v3.0,  "Create Workset" for tutorial
  • linked from the portal, but also at https://sharc.hathitrust.org/blacklight
  • the tutorial.
  • you can create a workset from a file - specification for file are given
  • It is a good idea to validate your workset before loading - the validator will let you know if there are issues with your filereally, really helps to use in a second window and operate the portal in the first
  • worksets can be private (open to your own use and management) or public (viewable by all logged-in HTRC users, management restricted to owner)

Algorithms

...

  • allows researchers to create a virtual machine environment, configure with tools, and analyze texts, see  Portal & Workset Builder Tutorial for v3.0, "Use the HTRC Data Capsule" for details Documentation available.
  • requires a VNC application for your browser, like VNC View for Google Chrome
  • designed to be a secure analytical environment that respects access restrictions to text while allowing for computational analysis; maintenance mode / secure mode
  • not yet tied to worksets, but there is a workaround
  • currently restricted to "open-open" (non-restricted) corpus; eventual objective is to allow for access to full HT corpus

Datasets

...

  • Data Set
    • A brief single page attachment describing the motivations and potential of the data set.
    • page level attributes (volume level and page level data) for all books in HT; rationale and features
    explained 
    • explained
    • can download full dataset via rsync (Watch out! BIG! 4TB!)
    • details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
    • David Mimno's "word similarity tool" is built from the full Extracted Feature
    dataset
    • data set

Bookworm

  • open source project, same basic functionality as Google nGram Viewer, although graphically faceted
  • base data is currently "open-open" data; working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.linked to a back version of the EF data set that includes 13.5 M volumes both full view and in copyright
  • plans and allocated grant to develop tie-in to worksets
  • see wiki for tutorial

...