Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
  1. BERT vs hand features, controversy paper
  2. Word embeddings
    (BERT - word pieces!)

    Expand
    titleOverview references

    Smith, Noah A. 2019. Contextual word representations: A contextual introduction. arxiv:1092.06006, version 2, dats Feb 19. 2019.
    Twitter commentary regarding the history as recounted in the above (Naftali Tishby and yours truly are among the "& co." referred to by Robert Munro): [1] [2] [3]

    Goldberg, Yoav. 2017. Neural network methods for natural language processing. Morgan Claypool. Earlier, shorter, open-access journal version: A primer on neural network models for natural language processing: JAIR 57:345--420, 2016.

  3. Language modeling = the bridge?

    Expand
    titleRecommendations from Jack Hessel and Yoav Artzi, Cornell

    Thanks to Jack Hessel and Yoav Artzi for the below. Paraphrasing errors are my own.

    The best off-the-shelf language model right now (caveat: this is a very fast-moving field) is GPT-2, where GPT stands for Generative Pre-Training. It seems to transfer well via fine-tuning to small new datasets. [code] [https://openai.com/blog/better-language-models/]

    Expand
    titleReferences

    Radford, Alec, Wu, Jeffrey, Child, Rewon, Luan, David, Amodei, Dario, Sutskever, Ilya. 2019. Language models are unsupervised multitask learners. Manuscript.