Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Juho Leinonen tells us about his research on automatic speech recognition, speech alignment and chatbots.
My name is Juho Leinonen and I am completing my PhD studies in the Speech Recognition research group led by Mikko Kurimo in Aalto University. I started my PhD studies in 2017 after a couple of years of work in industry.
The topic of my Master’s thesis was the automatic speech recognition for Sámi language, and it is possible for me to build on this experience in my PhD work as well. In my current research, regarding chatbots and forced alignment of speech, I still need language models and acoustic models, both of which are also required in automatic speech recognition. In speech recognizers, language models are used for recognizing words that are pronounced in an unclear or ambiguous way, whereas chatbots need language models for generating new text. Language models can also be applied on assessing the quality of text generated by bots. The process becomes circular: in order to evaluate the results in a reliable way, we need to understand what high-quality text is like, but the same understanding is a pre-requisite for generating text in the chatbot. This constitutes a philosophical problem as well as an engineering one.
The goal in traditional speech recognition is to find the text that corresponds to the audio recording as well as possible. When developing a speech recognizer, previously aligned speech data is first required in order to train the acoustic models. Aligning text with speech is actually routine work in speech recognition. However, speech alignment would be a useful functionality for researchers in other fields as well, and it is hardly possible for everyone to become a speech recognition professional before they can get started with their own research. During the past year, I have packaged the speech recognition and alignment tools used in our research group into a toolkit that would be as easy to share as possible. I am also searching for good measures that could be used for assessing the quality of the alignment. My goal is to find out which acoustic models or features produce the best alignment, and in what sort of situations it is possible or worthwhile to use the models trained on major languages for aligning minority languages. This research has also opened up the world of language researchers for me, since I am trying to adapt the tool to suit their purposes as well as possible.
On the spur of the moment, I ended up testing the Finnish speech recognizer, developed by our group, for aligning the Giellagas corpus of Northern Saami. This project gave me the idea of cross-language alignment that is described in my latest publication (Leinonen, Virpioja & Kurimo, 2021). Thus, an alignment tool developed for one language can possibly be applied on aligning speech and text in other languages as well, in case the sound and writing systems of the languages are sufficiently similar. In the future, I will also be utilizing other previously aligned speech corpora that are in the Language Bank of Finland. The automatic speech aligner that I have used in my research is now also available for other researchers as part of the Aalto University Automatic Speech Recognition System (Aalto-ASR v.2) that has been installed in the Puhti computing environment at CSC.
For training chatbots, I also use the Suomi24 corpus available in the Language Bank. It may seem strange to use the sort of language used in online discussion forums for ”training” purposes. However, huge amounts of text are required in order to train useful language models, and finding suitable material in sufficiently large quantities is very difficult.
Leinonen, J., Smit, P., Virpioja, S., & Kurimo, M. (2017). New baseline in automatic speech recognition for Northern Sámi. In International Workshop on Computational Linguistics for the Uralic Languages (pp. 89-99). https://doi.org/10.18653/v1/W18-0208
Leino, K., Leinonen, J., Singh, M., Virpioja, S., & Kurimo, M. (2020). FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics. In Interspeech (pp. 429-433). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2020-2511
Leinonen, J., Virpioja, S., & Kurimo, M. (2021, May). Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa). Linköping University Electronic Press. http://hdl.handle.net/10138/330758
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Humanities of the University of Helsinki.