Martti Rapola’s 19th century vocabulary

Martti Rapola (1891–1972), a distinguished researcher of Old Literary Finnish and Finnish Dialects, compiled extensive material on 19th-century Literary Finnish, which he organized according to its significance. From these pickings made in the 1930s and 1950s, Rapola’s 19th-century vocabulary was created, comprising a total of 44,000 headwords. Rapola made use of this material in many articles published in the 1940s and 1950s and in a selection published in 1960, named ’Sanojemme ensiesiintymiä Agricolasta Yrjö-Koskiseen’, which, as the name implies, contains a vocabulary established in Literary Finnish.

The material published online is based on the original headwords, which have been selectively submitted as a database. It contains information about a total of 5600 words, divided into 1070 concepts. This is about a quarter of the original data.  

Latest versions/subcorpora:  
Martti Rapola’s 19th century vocabulary, Sanat version
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the resource in Sanat
Search for all versions in META-SHARE  

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Modern Finnish Word List

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The entries of the word list are simple XML elements that indicate the lemma and inflection type for basic words. Rare inflection types and other restrictions are marked with attributes. Compounds are usually listed as just the lemma. Examples of the 78 inflection types and 17 consonant gradation types are available on the web site.

A copy of the word list is also available in Kielipankki – the Language Bank of Finland (, /appl/data/kielipankki/words).

Latest versions/subcorpora:  
Modern Finnish Word List
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
A copy of this version is available in the computing environment. icon-question-circle
Search for all versions in META-SHARE  

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequency List of Written Finnish Word Forms

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The resource contains a ranked frequency list of Finnish word forms as they appear in the Finnish Parole text corpus of 17 million written tokens. The list is available for download in three different sizes: all tokens, tokens that occur more than once, and tokens that occur more than twice, all in ISO-8859-1 (Latin-1) one entry per line. The five thousand most frequent forms are also available for browsing on the web site.

Latest versions/subcorpora:  
Frequency List of Written Finnish Word Forms
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for all versions in META-SHARE  

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequencies of Early Modern Finnish Words

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The list includes the word forms included in the Corpus of Early Modern Finnish of the Institute for the Languages of Finland together with their frequency information.

Latest versions/subcorpora:  
Frequencies of Early Modern Finnish Words
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for all versions in META-SHARE  

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequencies of Old Literary Finnish Words

This resource is offered by Kotus, Kotimaisten kielten keskus, the Institute for the Languages of Finland.

The resource contains a list of frequencies of old literary Finnish words. The list includes the words from the Corpus of Old Literary Finnish together with information about their frequency.

Latest versions/subcorpora:  
Frequencies of Old Literary Finnish Words
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open the website
Search for all versions in META-SHARE  

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th Edition (TSK 49)

The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th edition (TSK 49) contains information on more than 500 concepts in term records and concept diagrams. The concepts have been given definitions and term recommendations in Finnish and Swedish. The relations between the concepts are illustrated with the help of concept diagrams. The vocabulary is totally bilingual: foreword, instructions, concept descriptions and concept diagrams have all been translated into Swedish. The subjects covered in the vocabulary are the benefits provided by Kela (the Social Insurance Institution of Finland), e.g. sickness allowances and reimbursements for medical expenses under the Health Insurance Act, international medical care, occupational health care, disability benefits and interpreting services, rehabilitation organized and reimbursed by Kela, pensions paid by Kela, housing benefits, financial aid for students, conscript’s allowance, benefits for families with children and unemployment allowances.

More information (in Finnish)

Latest versions/subcorpora:
The Terminological Vocabulary of Kela – Benefit-related Concepts, 4th Edition (TSK 49)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

The Vocabulary of Safety and Health at Work (TSK 35)

The Vocabulary of Safety and Health at Work (TSK 35) contains 465 concepts with Finnish term recommendations, definitions and notes. The equivalents are given in Swedish, English, German and French. The definitions and notes have been translated into Swedish. The chapters of the vocabulary include occupational health, safety at work, work environment, risk management, administration of working life and organizing of safety and health at work as well as important registers, methods and cooperation organizations.

More information (in Finnish)

Latest versions/subcorpora:
The Vocabulary of Safety and Health at Work (TSK 35)
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Frequency Lexicon of the Finnish Newspaper Language

The Frequency Lexicon of the Finnish Newspaper Language contains the most common 9996 lemmas of Finnish newspaper language. The lexicon was compiled in 2004 from a source material containing 43,999,826 words.

Latest versions/subcorpora:  
Frequency Lexicon of the Finnish Newspaper Language
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for all versions in META-SHARE  

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

The N-grams of the Newspaper and Periodical Corpus of the National Library of Finland

The National Library of Finland has digitized a large proportion of Finland’s Finnish and Swedish newspapers, magazines, and periodicals published between 1820 and 2000 (Finnish) and between 1770 and 1940 (Swedish). This resource contains sets of unigrams, bigrams and trigrams extracted from a corpus that has been compiled from the digitized newspapers by the University of Helsinki.

The resource consists of plain UTF-8 encoded text files, each containing a list of n-grams that have been ordered by their frequencies from highest to lowest. Each line in a file consists of two or more fields separated by a whitespace character. The first field indicates the absolute frequency of a unique n-gram, and the remaining fields contain the tokens (strings of non-whitespace characters) of the n-gram itself. Uppercase letters have been retained as such and have not been converted into lowercase letters. Punctuation characters are treated as separate tokens except when they are part of an abbreviation (”etc.”, ”mm.”) or when they separate a case ending or an enclitic from an abbreviation or a sign (”EU:ssa”, ”%:iin”), as per the typographic principles of standard Finnish. The n-grams have been computed across sentence boundaries for each decade (from the 1770s to the 1940s and from the 1820s to the 2000s respectively) as well as for the entire corpus, with unigrams, bigrams and trigrams in separate files.

Since the source material has been digitized by the means of optical character recognition (OCR), the resource also contains erroneous word forms and non-word strings of characters. Furthermore, due to the large time span of the original corpus, the resource contains several lexical items and spelling variants that have since become obsolete in standard Finnish and standard Swedish.

The resource will be updated in the future as improvements are being made to the source material.

The data is derived from The Newspaper and Periodical Corpus of the National Library of Finland

Latest versions/subcorpora:  
The Finnish N-grams 1820-2000 of the Newspaper and Periodical Corpus of the National Library of Finland
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
A copy of this version is available in the computing environment. icon-question-circle
The Swedish N-grams 1770-1940 of the Newspaper and Periodical Corpus of the National Library of Finland
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
A copy of this version is available in the computing environment. icon-question-circle
Search for these versions in META-SHARE  

Of this language corpus different versions/subcorpora are published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose

The corpus contains data from Matias Tamminen’s MA thesis study ”Then shall I know fully: Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose” by Matias Tamminen (2018), University of Helsinki.

The source data are the corpus Classics of English and American Literature translated by Kersti Juva, English-Finnish parallel corpus and the corpus of Translated Finnish.

Latest versions/subcorpora:  
Relative frequencies of part-of-speech n-grams in native and translated Finnish literary prose
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE  

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Karelian Dictionary

The six volumes of the Karelian dictionary were published in 1968-2005 by the Institute for the Languages of Finland and the Finno-Ugrian Society.

The online dictionary is a project of the Insitute for the Languages of Finland. It is updated according to necessity and resources.

More information on the dictionary from the Karjalan kielen sanakirja website:

Downloadable in XML format:

Latest versions/subcorpora:  
Karelian Dictionary
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
A copy of this version is available in the computing environment. icon-question-circle
Headword List of the Karelian Dictionary
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Open web page
Search for all versions in META-SHARE  

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Finnish Verbal Colorative Constructions

The resource contains Finnish verbal colorative constructions from the database of the word notes used when creating the dictionaries Nykysuomen sanakirja and Kielitoimiston sanakirja (, from various literary works, from a query test made by Maria-Magdalena Jürvetson as well as from different Internet sources.

Latest versions/subcorpora:
Finnish Verbal Colorative Constructions
icon-info-circle Metadata and license
icon-quote-right Attribution instructions
Download the resource
Search for these versions in META-SHARE

Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.

Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.

This resource group page has a Persistent Identifier:

Search the Language Bank Portal:
Sofoklis Kakouros
Researcher of the Month: Sofoklis Kakouros


Upcoming events


The Language Bank's technical support:
kielipankki (at)
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at)
tel. +358 29 4129317

More contact information