...
- page level attributes (volume level and page level data) for 4M+ open-open; rationale and features explainedÂ
- can download full datasets via rsync (Watch out! BIG!)
- details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
- David Mimno's "word similarity tool" is built from the full Extracted Feature dataset
Bookworm
- same functionality as Google nGrams
- base data is currently "open-open" data; working on legal aspects required for base data to shift to entire HT corpus, regardless of viewability.
- plans and allocated grant to develop tie-in to worksets
- See wiki for tutorial
...