The tools and services maintained by the Language Bank may be accessible via a web interface, or they can be installed via download from e.g. GitHub or Korp. You can also find other tools developed by member organizations of FIN-CLARIN / CLARIN ERIC.
Our language resources have three different levels of support.
A: The resource is under active development. The Language Bank of Finland fixes any issues as soon as possible.
B: The resource is developed only upon user request. The Language Bank of Finland aims to fix issues concerning the resource, but external contributions may be required.
C: The resource is available ”as is”. The Language Bank of Finland does not fix nor develop the resource.
If you are looking for a tool not listed here, please have a look in COMEDI or CLARIN Virtual Language Observatory (VLO).
Please find an overview of all our resources sorted by resource families on Resource families Fin-Clarin.
Start | Name (and metadata) | Description | Instructions | Install | Info | Administrator | Service level |
---|---|---|---|---|---|---|---|
![]() | Korp | A web-based concordance tool that can be used for corpus queries based on morphosyntactic analysis and various other features. | Instructions | icon-question-circle | ![]() | A | |
Download | Download service | Download certain corpora. | icon-question-circle | ![]() | A | ||
Aalto-ASR | Aalto University Automatic Speech Recognition System | An automatic speech recognition toolkit that can be used in the CSC computing environment. | Instructions | Install (GitHub) | icon-question-circle | ![]() | |
ANEE Lexical Networks | ANEE Lexical Networks | A graphic semantic dictionary represented as a network. You can use the portal for exploring the meanings of singular Akkadian words in a visual way. | icon-question-circle | ![]() | |||
Annif | Annif | Annif is a tool for automated subject indexing and classification, developed at the National Library of Finland. | ![]() | ||||
![]() | CLARIN Federated Content Search | Run a centralized query from all the resources provided by CLARIN centers. | icon-question-circle | ![]() | |||
Demo | Demo tools at the Language Bank of Finland | Demos of tools that are in development at the Language Bank of Finland: FinTag and FiNER, FinParse, FinSentiment, FinnWordNet, HFST POS taggers, HFST morphological analyzers, Lemmamatch, etc. (In Finnish) | ![]() | C | |||
Dictionary of Contemporary Finnish | Dictionary of Contemporary Finnish | Dictionary of standard Finnish made by the Institute for the Languages of Finland. | icon-question-circle | ![]() | |||
digi.kansalliskirjasto.fi | Digi – Digital collections of the National Library of Finland | A search and download service for digital collections from the National Library of Finland. In addition to newspapers and magazines, the collections include, e.g., books, pictures and maps. Note that a large proportion of the newspapers and magazines can also be used via the Korp service in the Language Bank (see KLK). | icon-question-circle | ![]() | |||
![]() | ELAN | ELAN is a program for transcribing and annotating audio and video files. It can also be used for searching locally stored collections of annotated material. | Instructions | Install | icon-question-circle | ![]() | |
FinBERT | FinBERT | BERT model trained from scratch on Finnish. | Install (GitHub) | icon-question-circle | ![]() | ||
Finland Swedish Online | Finland Swedish Online | A platform offering online courses for learners of Finland Swedish. | icon-question-circle | ![]() | |||
FinMeter | FinMeter – Tools for analyzing poetry in Finnish | FinMeter is a library for analyzing poetry in Finnish. It handles typical rhyming such as alliteration, assonance and consonance, Japanese meters and Kalevala meter. It can also be used to hyphenate Finnish and analyse meter. In addition, it can do semantic clustering, metaphor interpretation, concreteness scoring and sentiment analysis. | ![]() | ||||
TDPP | Finnish dependency parser developed by TurkuNLP (TDPP) | An open source dependency parsing pipeline developed by the TurkuNLP group for analyzing Finnish text. | Install (GitHub) | icon-question-circle | ![]() | ||
FinTag | Finnish Tagtools | A part-of-speech and morphology tagger and a named entity recogniser for Finnish. | Install Use via Docker | icon-question-circle | ![]() | A | |
FinnONTO | FinnONTO | Finnish and international ontologies, vocabularies and thesauri needed for publishing content cost-efficiently on the Semantic Web. | icon-question-circle | ||||
finnsurveytext | finnsurveytext | Tool set for social science researchers to be able to analyse and understand responses to open-ended questions within their surveys. | Instructions | Install (GitHub) | icon-question-circle | ||
Gephi | Gephi | A program for network analysis and visualization. | Install | ||||
GiellaLT | GiellaLT | GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages | icon-question-circle | ||||
Giellatekno | Giellatekno - Dictionaries and tools | Dictionaries and tools for the analysis of Saami and other morphologically-rich languages. | icon-question-circle | ||||
HeLI-OTS | HeLI-OTS 2.0 | HeLI off-the-shelf language identifier with language models for 220 languages. | Install (Zenodo) | icon-question-circle | ![]() | ||
![]() | INCEpTION | Text annotation tool. (newer version of WebAnno) | User Guide | Standalone installation | icon-question-circle | ![]() | A |
Kotus digital collections | Kotus digital collections | The web page offers links to the Institute’s corpora and material available online free of charge. | icon-question-circle | ![]() | |||
![]() | Lääketutka | Lääketutka, "the Medicine Radar", provides analytics about health, medicine and symptom-related discussions in the Suomi24 discussion forum. | icon-question-circle | ![]() | C | ||
Murre | Murre | The | ![]() | ||||
nimiarkisto.fi | Nimiarkisto | Nimiarkisto.fi is a portal with the most important digital resources of names and named entities collected from and archived in Finland. | icon-question-circle | ![]() | |||
Nordic Tweet Stream (NTS) | Nordic Tweet Stream (NTS) search & visualization interface | A multilingual monitor corpus of geolocated tweets and associated metadata from the Nordic region. | icon-question-circle | ||||
![]() | OPUS | An interface for open source parallel corpora. | icon-question-circle | ![]() | |||
![]() | Praat | Praat is a comprehensive toolkit for annotating, processing, analyzing and visualizing speech. Praat includes a scripting language. | Instructions | Install | icon-question-circle | ![]() | |
![]() | Proto-Indo-European Lexicon | A generative etymological dictionary of Indo-European languages | icon-question-circle | ![]() | |||
Sanat | Sanat | A platform for publishing lexica and word lists. | icon-question-circle | ![]() | B | ||
![]() | Signbank | Lexical database of Finnish Sign Language. | icon-question-circle | ![]() | A | ||
![]() | Sparv | A multilingual toolkit provided by the Swedish Språkbanken for parsing and annotating text in various languages. | User manual (GUI) | Installation and setup | icon-question-circle | ![]() | |
Finnish Internet Parsebank: SETS | Syntax-based search (SETS) from the Finnish Internet Parsebank | Syntax-based search (SETS) from parts of the Finnish Internet Parsebank. | Documentation | ![]() | |||
tekstiks.ee | tekstiks.ee – Speech recognition: speech to text | Automated speech transcription service for Estonian and Finnish speech and a user interface for transcription editing. | icon-question-circle | ||||
Terminology Forum | Terminology Forum | Terminology Forum – A collection of links to special field glossaries, University of Vaasa | icon-question-circle | ||||
textreuse.sls.fi | Text reuse in the Swedish-language press, 1645-1918 | A search engine for searching and analyzing clusters of text reuse in the Swedish-language press from 1645 to 1918. | icon-question-circle | ||||
Texthammer | Texthammer | A search and analysis toolkit for parallel corpora provided by the University of Tampere. | Documentation (PDF) | icon-question-circle | ![]() | ||
![]() | The Helsinki Term Bank for the Arts and Sciences | A multidisciplinary project that aims to gather a permanent terminological database for all fields of research in Finland. | icon-question-circle | ![]() | A | ||
![]() | Transkribus | A toolkit for transcribing and managing historical documents (e.g., images and scanned text). | Instructions (PDF) | Install | icon-question-circle | ![]() | |
TDPP-LBF | Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) | Finnish Dependency Parsing Pipeline, adapted by The Language Bank of Finland | Install (GitHub) | icon-question-circle | ![]() | ||
Turku Neural Parser Pipeline | Turku Neural Parser Pipeline | A tool developed by the Turku NLP group for parsing Finnish text. | Install (GitHub) Demo | icon-question-circle | ![]() | ||
TNPP-LBF | Turku Neural Parser Pipeline, Kielipankki version (TNPP-LBF) | Turku Neural Parsing Pipeline, adapted by The Language Bank of Finland | Access via Puhti Install (Docker) | icon-question-circle | ![]() | ||
TurkuNLP word embedding | TurkuNLP word embedding demo (word2vec) | A tool developed for analyzing the semantic similarity of words. | icon-question-circle | ![]() | |||
UDPipe | UDPipe | UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. | Install (GitHub) | icon-question-circle | |||
UDPipe-LBF | UDPipe Kielipankki version | UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files, installed at Kielipankki. | Access via Puhti | icon-question-circle | ![]() | ||
UralicNLP | UralicNLP- Natural language processing for many languages | UralicNLP can produce morphological analyses, generate morphological forms, lemmatize words and give lexical information about words in Uralic and other languages.The functionality originates mainly in FST tools and dictionaries developed in the GiellaLT infrastructure and Apertium. | ![]() | ||||
VRT Tools | VRT Tools | Command-line tools for manipulating segmented and annotated text by using VRT (verticalized text) as an interchange format. VRT is related to Corpus WorkBench (used in the backend of the Korp concordancer tool). | GitHub | ![]() | |||
Wanca | Wanca | Wanca is a portal for websites in Uralic languages. | icon-question-circle | ![]() | A | ||
WebMAUS | WebMAUS | A set of tools for automatic segmentation and labelling of speech. | Instructions | ||||
Whisper | Whisper | Whisper is a general-purpose speech recognition model trained on a large dataset of diverse audio. Whisper can perform multilingual speech recognition, speech translation, and language identification. Whisper can be used in the CSC computing environment, also in SD Desktop. | Tutorial (CSC) | GitHub: Whisper (OpenAI) and WhisperDO for calling Whisper (by Nicholas G. Cotton) | Tutorial (CSC) | ![]() | A |