You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

 

Some features I like

 

  1. (in a long line of LiWC-like lexicons) Chenhao Tan's list of hedging phrases

     

    (some as regular expressions [README, list itself]

  2. Part-of-speech n-grams
  3. Language models on the most frequent words only
    1. Distinctiveness
  4. Language models on the content words
  5. Distributional similarity

... and one feature that I both like and drives me crazy: token length

What does this mean in the age of deep learning, where we don't need to worry about features anymore?

  1. BERT vs hand features, controversy paper
  2. Word embeddings
    1. BERT - word pieces!
  3. Language modeling

 

 

test

hidden

less hidden

not hidden

  • No labels