...
For each article (a set of revisions), a list of all unique sentences in the article was generated.Each sentence was given an id, and each revision of the article was then represented as a sequence of SENTENCE_IDs that belonged to it. This was a reasonably compact representation of the stream.
...