The tools and services maintained by the Language Bank may be accessible via a web interface, or they can be installed via download from e.g. GitHub or Korp. You can also find other tools developed by member organizations of FIN-CLARIN / CLARIN ERIC.
Our language resources have three different levels of support.
A: The resource is under active development. The Language Bank of Finland fixes any issues as soon as possible.
B: The resource is developed only upon user request. The Language Bank of Finland aims to fix issues concerning the resource, but external contributions may be required.
C: The resource is available ”as is”. The Language Bank of Finland does not fix nor develop the resource.
If you are looking for a tool not listed here, please have a look in COMEDI or CLARIN Virtual Language Observatory (VLO).
Please find an overview of all our resources sorted by resource families on Resource families Fin-Clarin.
Start | Name | Description | Instructions | Install | Info | Administrator | Service level |
---|---|---|---|---|---|---|---|
Korp | A web-based concordance tool that can be used for corpus queries based on morphosyntactic analysis and various other features. | Instructions | A | ||||
Download | Download service | Download certain corpora. | A | ||||
Sanat | Sanat | A platform for publishing lexica and word lists. | B | ||||
FinTag | Finnish Tagtools | A part-of-speech and morphology tagger and a named entity recogniser for Finnish. | Install Use via Docker | A | |||
Demo | Demo tools at the Language Bank of Finland | Demos of tools that are in development at the Language Bank of Finland: FinTag and FiNER, FinParse, FinSentiment, FinnWordNet, HFST POS taggers, HFST morphological analyzers, Lemmamatch, etc. (In Finnish) | C | ||||
INCEpTION | Text annotation tool. (newer version of WebAnno) | User Guide | Standalone installation | A | |||
Signbank | Lexical database of Finnish Sign Language. | A | |||||
OPUS | An interface for open source parallel corpora. | ||||||
The Helsinki Term Bank for the Arts and Sciences | A multidisciplinary project that aims to gather a permanent terminological database for all fields of research in Finland. | A | |||||
Lääketutka | Lääketutka, "the Medicine Radar", provides analytics about health, medicine and symptom-related discussions in the Suomi24 discussion forum. | C | |||||
ANEE Lexical Networks | ANEE Lexical Networks | A graphic semantic dictionary represented as a network. You can use the portal for exploring the meanings of singular Akkadian words in a visual way. | |||||
Proto-Indo-European Lexicon | A generative etymological dictionary of Indo-European languages | ||||||
Wanca | Wanca | Wanca is a portal for websites in Uralic languages. | A | ||||
TNPP-LBF | Turku Neural Parser Pipeline, Kielipankki version (TNPP-LBF) | Turku Neural Parsing Pipeline, adapted by The Language Bank of Finland | Access via Puhti Install (Docker) | ||||
Turku Neural Parser Pipeline | Turku Neural Parser Pipeline | A tool developed by the Turku NLP group for parsing Finnish text. | Install (GitHub) Demo | ||||
TDPP-LBF | Turku Dependency Parser Pipeline, Kielipankki version (TDPP-LBF) | Finnish Dependency Parsing Pipeline, adapted by The Language Bank of Finland | Install (GitHub) | ||||
TDPP | Finnish dependency parser developed by TurkuNLP (TDPP) | An open source dependency parsing pipeline developed by the TurkuNLP group for analyzing Finnish text. | Install (GitHub) | ||||
UDPipe-LBF | UDPipe Kielipankki version | UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files, installed at Kielipankki. | Access via Puhti | ||||
UDPipe | UDPipe | UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. | Install (GitHub) | ||||
TurkuNLP word embedding | TurkuNLP word embedding demo (word2vec) | A tool developed for analyzing the semantic similarity of words. | |||||
Finnish Internet Parsebank: SETS | Syntax-based search (SETS) from the Finnish Internet Parsebank | Syntax-based search (SETS) from parts of the Finnish Internet Parsebank. | Documentation | ||||
FinBERT | FinBERT | BERT model trained from scratch on Finnish. | Install (GitHub) | ||||
Texthammer | Texthammer | A search and analysis toolkit for parallel corpora provided by the University of Tampere. | Documentation (PDF) | ||||
nimiarkisto.fi | Nimiarkisto | Nimiarkisto.fi is a portal with the most important digital resources of names and named entities collected from and archived in Finland. | |||||
Terminology Forum | Terminology Forum | Terminology Forum – A collection of links to special field glossaries, University of Vaasa | |||||
Sparv | A multilingual toolkit provided by the Swedish Språkbanken for parsing and annotating text in various languages. | User manual (GUI) | Installation and setup | ||||
WebMAUS | WebMAUS | A set of tools for automatic segmentation and labelling of speech. | Instructions | ||||
Transkribus | A toolkit for transcribing and managing historical documents (e.g., images and scanned text). | Instructions (PDF) | Install | ||||
Aalto-ASR | Aalto University Automatic Speech Recognition System | An automatic speech recognition toolkit that can be used in the CSC computing environment. Some features are available via the Mylly service. | Instructions | Install (GitHub) | |||
ELAN | ELAN is a program for transcribing and annotating audio and video files. It can also be used for searching locally stored collections of annotated material. | Instructions | Install | ||||
Praat | Praat is a comprehensive toolkit for annotating, processing, analyzing and visualizing speech. Praat includes a scripting language. | Instructions | Install | ||||
CLARIN Federated Content Search | Run a centralized query from all the resources provided by CLARIN centers. | ||||||
Gephi | Gephi | A program for network analysis and visualization. | Install | ||||
LAT (Language Archive Tools) | A toolkit for browsing and querying annotated speech and video corpora. | Instructions | C | ||||
digi.kansalliskirjasto.fi | Digital collections | A search and download service for digital collections from the National Library of Finland. In addition to newspapers and magazines, the collections include, e.g., books, pictures and maps. Note that a large proportion of the newspapers and magazines can also be used via the Korp service in the Language Bank (see KLK). | |||||
textreuse.sls.fi | Text reuse in the Swedish-language press, 1645-1918 | A search engine for searching and analyzing clusters of text reuse in the Swedish-language press from 1645 to 1918. | |||||
FinnONTO | FinnONTO | Finnish and international ontologies, vocabularies and thesauri needed for publishing content cost-efficiently on the Semantic Web. | |||||
Dictionary of Contemporary Finnish | Dictionary of Contemporary Finnish | Dictionary of standard Finnish made by the Institute for the Languages of Finland. | |||||
HeLI-OTS | HeLI-OTS 2.0 | HeLI off-the-shelf language identifier with language models for 220 languages. | |||||
Kotus digital collections | Kotus digital collections | The web page offers links to the Institute’s corpora and material available online free of charge. | |||||
Giellatekno | Giellatekno - Dictionaries and tools | Dictionaries and tools for the analysis of Saami and other morphologically-rich languages. | |||||
GiellaLT | GiellaLT | GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages | |||||
FinMeter | FinMeter – Tools for analyzing poetry in Finnish | FinMeter is a library for analyzing poetry in Finnish. It handles typical rhyming such as alliteration, assonance and consonance, Japanese meters and Kalevala meter. It can also be used to hyphenate Finnish and analyse meter. In addition, it can do semantic clustering, metaphor interpretation, concreteness scoring and sentiment analysis. | |||||
Murre | Murre | The | |||||
UralicNLP | UralicNLP- Natural language processing for many languages | UralicNLP can produce morphological analyses, generate morphological forms, lemmatize words and give lexical information about words in Uralic and other languages.The functionality originates mainly in FST tools and dictionaries developed in the GiellaLT infrastructure and Apertium. | |||||
Annif | Annif | Annif is a tool for automated subject indexing and classification, developed at the National Library of Finland. | |||||
tekstiks.ee | tekstiks.ee – Speech recognition: speech to text | Automated speech transcription service for Estonian and Finnish speech and a user interface for transcription editing. | |||||
finnsurveytext | finnsurveytext | Tool set for social science researchers to be able to analyse and understand responses to open-ended questions within their surveys. | Instructions | Install (GitHub) | |||
Nordic Tweet Stream (NTS) | Nordic Tweet Stream (NTS) search & visualization interface | A multilingual monitor corpus of geolocated tweets and associated metadata from the Nordic region. | |||||
Mylly | Mylly (discontinued) | Versatile data analysis platform with interactive visualizations and workflows. | Instructions | C | |||
Finland Swedish Online | Finland Swedish Online | A platform offering online courses for learners of Finland Swedish. |