Skip to main content

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

page level attributes (volume level and page level data) for 4M+ open-open; rationale and features explained
can download full datasets via rsync (Watch out! BIG!)
details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
David Mimno's "word similarity tool" is built from the full Extracted Feature dataset

Bookworm

same functionality as Google nGrams
base data is currently "open-open" data; working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.
plans and allocated grant to develop tie-in to worksets
See wiki for tutorial

...