Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Veronika Laippala tells us about her research on large language resources and computational methods.
My name is Veronika Laippala. I am a Professor of Digital Language Research at the School of Languages and Translation Studies of the University of Turku and the TurkuNLP research group.
Most of my research is related to language use in one way or another: to large language resources, mostly compiled from the Internet, and to computational methods to analyze the data. In addition, I have been involved in the development of Finnish language technology, including resources such as the Turku Dependency Treebank and the Turku NER named entity recognition system.
We have currently several on-going projects where we process large web-based language resources by analyzing the genres or registers found in them and by developing machine learning methods that can automatically recognize the different registers. Such methods and tools would benefit both Internet users in general and researchers using Internet-based language materials.
The wide selection of corpora and resources in the Language Bank of Finland provides huge opportunities! The Suomi 24 corpus is quite unique in its scope and it is probably the resource I have used the most. In addition, the syntactic parser developed on the basis of our tree bank is used to parse the corpora in Kielipankki. Naturally, I also teach the use of the Korp interface in my courses.
Liina Repo, Valtteri Skantsi, Samuel Rönnqvist, Saara Hellström, Miika Oinonen, Anna Salmela, Douglas Biber, Jesse Egbert, Sampo Pyysalo & Veronika Laippala (2021). Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 183–191. Available: https://aclanthology.org/2021.eacl-srw.24.
Veronika Laippala, Jesse Egbert, Douglas Biber & Aki-Juhani Kyröläinen (2021). Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents. Language Resources and Evaluation, Vol. 55, pp. 757–788. DOI: 10.1007/s10579-020-09519-z.
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Humanities of the University of Helsinki.