Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

(1) Let's look again at the Arapaho gospel of St. LukeSwitch to text view.

  • Is this OCR accurate to the visually captured page? 
  • What is a word? How would you define "word" to a computer?
  • What isn't a word? How would you tell a computer to exclude these?
  • Consider languages with which you are familiar.  Can you think of cases where tokens might contain more than one word?
  • What sets of rules would we need in order to tokenize effectively?  Would these be ordered in any specific way?
  • Is there a "right way" to tokenize?

...