Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Sam Hardwick, project researcher at the University of Helsinki tells us about developing some of the tools provided by the Language Bank, Kielipankki.
I’m a freelance consultant, researcher and programmer. I started in language technology at the University of Helsinki in a research software project called HFST. We developed code for computational morphology, which ended up being used in eg. inflecting dictionaries and spellcheckers for languages with extensive morphology (like Finnish, Sámi and Greenlandic). Since then I’ve worked on the technical side of various infrastructure and research projects, and done private consulting work.
Right now I’m involved with publishing a sentiment corpus for Finnish. This is a collection of texts gathered from social media with their sentiment – whether they are positive, neutral or negative – annotated by humans. This will be the basis for automatic sentiment classification for future corpora and tools.
I’m also involved with the ANEE-project, helping to make a treebank for Akkadian, which again will be the basis of an automatic annotation tool. Hopefully we’ll be ultimately able to automatically annotate more of the texts in this ancient language.
I’ve done a lot of development work directly for Kielipankki. For example, right now I’m planning an API for accessing corpora directly from code. NLP applications are more and more the domain of general machine learning people, not just language experts, and there’s a lot of interest in our data and resources.
Hardwick, S., Enqvist, E. J., Onikki-Rantajääskö, T. A., & Linden, B.K. J. (2018). Tieteen kansallinen termipankki (TTP) ja tiedonlouhinnan apuneuvot. Poster (in Finnish) at the Annual Conference of Linguistics, Helsinki, Finland.
I’ve published demonstrations for various bits of code and analysis, some of it perhaps comprehensible in English, here: https://www.kielipankki.fi/tools/demo/
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.