<< Back to the group page

The Veps morphology and tools

The GitHub repository contains finite state source files for the Veps language, for building morphological analysers, proofing tools and dictionaries.

The Veps language

Veps belongs to the Balto-Finnic branch of the Uralic language family, and it has traditionally been spoken in regions of Karelia and the oblastʹs of Leningrad and Vologda. The Veps language has its closest ties with Ludic and Olonets-Karelian, which morpho-syntactically share some features such as case distinctions between expression of location versus departure, a passive conjugation, and an extensive infusion of Slavic vocabulary. While Veps has vowel harmony like some other Uralic languages, it does not show indications of vowel quantity or consonant gradation, so common to other Balto-Finnic and even Saamic languages. Thus, the two-letter word «so» ‘swamp’ is a cognate of Finnish ‹suo› and Estonian ‹soo›, which share that same meaning. Regular publication of the Veps-language newspaper «Kodima» can be found here «https://omamedia.ru/fi/publication/kodima/», there is also a wikipedia and occasional other publications, including the Gospels of Mark (1992), John (1993) and Mathew (1998).

Although, originally earmarked for development in a project finance through the Kone Foundation “Language Programme” (2013–2017), Veps was not included in the project 2013–2014. The language was seen as relatively simple morphologically and deemed to be an ideal object for teaching beginners in language technology. Presently, a Veps-Finnish dictionary is being developed by Heidi Niva (see researcher of the month and blogs.helsinki.fi/vajehnik-projekt/) on a part-time basis on the dictionary editing platform Veʹrdd (Skolt Saami for ‘flow’ and also where the Finnish-Skolt Saami dictionary was edited for aligned development with linguistic tools in the GiellaLT infrastructure). Bringing Veps-Finnish dictionary development together with open-source language-technological documentation and development is proving beneficial for language facilitation.

Veps materials

Veps (vep)
NT 2013
total words: 136,519
total characters (from words): 731,678
unique words: 14,610
Beginning: (2024-09-06)
unique misses: 8,439
number of lines before hapax: 3,919
Lacking unambiguous PoS: 13,181
Lacking unambiguous dependency: 18,531
Size of lexicon.lexc: 2,463
Number of LEXICONs: 377

In September of 2024, the model was quite small. It consisted of an initial size of 3,919 lexical items, i.e., of which 958 were nouns, 1232 were verbs and 137 were adjectives. Language tools for the Veps language are now under development. And Biblical Verses for Uralic Studies (PaBiVUS-version 2), through the Language Bank of Finland, will feature an annotated version of the Veps New Testament from 2013.

Follow our progress on GiellaLT and in the UralicNLP python, java and .net libraries.

Search the Language Bank Portal:
Sofoklis Kakouros
Researcher of the Month: Sofoklis Kakouros


Upcoming events


The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information