...
Extracted Features dataset
- A brief single page attachment describing the motivations and potential of the data set.
- page level attributes (volume level and page level data) for all books in HT; rationale and features explainedÂ
- can download full dataset via rsync (Watch out! BIG! 4TB!)
- details on leveraging the dataset to select data using a workset and the EF_Rsync_Script_Generator algorithm to download data for just that set.
- David Mimno's "word similarity tool" is built from the full Extracted Feature dataset
...