This variant of the lemmamatch demo demonstrates the tagging of input text with a simple word list. We use entries and categories from The Helsinki Term Bank for the Arts and Sciences to tag each word form which has a lemma matching an entry.
The list is in the form of a precompiled finite-state transducer. Any finite-state ruleset can in principle be used for tagging.
An additional table shows the probability of the input text being drawn from the corpus of master's dissertations from each of the major faculties of the University of Helsinki.
The scoring of the table represents the Kullback-Leibler divergence between the distribution of lemmas in the input text and the distribution of lemmas in the dissertation corpus.
Page generated in 0.00 seconds