Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Extracted Features dataset

  • A brief single page attachment describing the motivations and potential of the data set.
  • page level attributes (volume level and page level data) for all books in HT; rationale and features explained 
  • can download full dataset via rsync (Watch out! BIG! 4TB!)
  • details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
  • David Mimno's "word similarity tool" is built from the full Extracted Feature dataset

...