Page History

...

Expand

BERT vs hand features, controversy paper

Word embeddings
(BERT - word pieces!)

Expand

title	Overview references

Smith, Noah A. 2019. Contextual word representations: A contextual introduction. arxiv:1092.06006, version 2, dats Feb 19. 2019.
Twitter commentary regarding the history as recounted in the above (Naftali Tishby and yours truly are among the "& co." referred to by Robert Munro): [1] [2] [3]

Goldberg, Yoav. 2017. Neural network methods for natural language processing. Morgan Claypool. Earlier, shorter, open-access journal version: A primer on neural network models for natural language processing: JAIR 57:345--420, 2016.

Language modeling = the bridge?

Expand

title	Recommendations from Jack Hessel and Yoav Artzi, Cornell

Thanks to Jack Hessel and Yoav Artzi for the below. Paraphrasing errors are my own.

The best off-the-shelf language model right now (caveat: this is a very fast-moving field) is GPT-2, where GPT stands for Generative Pre-Training. It seems to transfer well via fine-tuning to small new datasets. [code] [https://openai.com/blog/better-language-models/]

Expand

title	References

Radford, Alec, Wu, Jeffrey, Child, Rewon, Luan, David, Amodei, Dario, Sutskever, Ilya. 2019. Language models are unsupervised multitask learners. Manuscript.

Page tree

Versions Compared

Old Version 27

New Version 28

Key