Researcher of the Month: Juraj Šimko

Photo: Veikko Somerpuro

Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Juraj Šimko tells us about his research on speech articulation and prosody. The Phonetics and Speech Synthesis Research Group at the University of Helsinki also aims to use large language models for finding answers to certain theoretical questions related to speech.

Who are you?

I am a University Lecturer in Phonetics, working at the University of Helsinki since 2013. Prior to that I have studied and worked at several Universities in Slovakia, Ireland and Germany, and I spend several years as a Language Specialist in Microsoft. I currently also hold an Honorary Professorship at the Indian Institute of Technology in Guwahati. My background is in Maths, Cognitive Science and Phonetics.

I am a member of the Phonetics and Speech Synthesis Research Group at the Department of Digital Humanities, but I am currently also involved in an ERC Advanced grant (to Professor Alice Turk) called Planning the Articulation of Spoken Utterances at the University of Edinburgh, where we investigate and model cognitive processes behind speech production and articulation.

What is your research topic?

I am passionate about human speech research. Besides speech articulation, my own as well as our Group’s main research interest is speech prosody, that is, essentially, all those melodic, rhythmic, emotional aspects of speech that go beyond the linguistic message that we pass on when we speak. In our current project Predictive Processing Approach to Modelling Prosodic Hierarchy for Speech Synthesis we are working on a novel speech synthesis architecture that is inspired by the influential theoretical and modelling paradigm of human cognition called Predictive Processing. Of course, the first obvious aim is to produce a world-class speech synthesis, and our team has indeed been creating state-of-the-art Finnish and Finland Swedish synthesis systems. But we also want to use the huge language models that drive technological applications as statistical representations of speech material used for their training, and use them to answer theoretical questions related to speech. These questions include, among others, distribution and evolution of accents and dialects, relationship between sociolinguistics and prosody, and prosodic patterns in politicians’ parliamentary speeches.

How is your research related to Kielipankki?

In order to do all that, we need quite a lot of data. Some of it we create ourselves, with invaluable assistance from Kielipankki experts: we have designed and recorded FinSyn corpus of high quality speech material intended for speech technology application, primarily for speech synthesis. The corpus contains ~75 hours of studio quality recordings from three voice talents, two of them speaking Finnish and one Finland Swedish. This corpus will appear as a part of Kielipankki collection. Our work on dialects and sociolinguistics heavily relies on other Kielipankki corpora, primarily the groundbreaking Donate Speech (Lahjoita puhetta) Corpus and Aalto Finnish Parliament ASR Corpus.

Recent publications

Törö, T., Suni, A. and Šimko, J. (2024). Analysis of regional variants in a vast corpus of Finnish spontaneous speech using a large-scale self-supervised model, Proceedings of Speech Prosody 2024, Leiden, Netherlands. DOI: 10.21437/SpeechProsody.2024

Vainio, M., Suni, A., Šimko, J. and Kakouros, S. (2024). The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech, Proceedings of Speech Prosody 2024, Leiden, Netherlands. DOI: 10.21437/SpeechProsody.2024

Elie, B., and Šimko, J., and Turk, A. (2024). Optimization-based modeling of Lombard speech articulation: Supraglottal characteristics. JASA Express Letters, 4(1). https://doi.org/10.1121/10.0024364

Kakouros, S., Šimko, J., Vainio M., and Suni, A. (2023). Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody, Proceedings of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France. https://doi.org/10.21437/SSW.2023-20

Šimko, J., Törö, T., Vainio M., and Suni, A. (2023). Prosody under control: Controlling prosody in text-to-speech synthesis by adjustments in latent reference space, Proceedings of the 18th International Congress of Phonetic Sciences, Prague, Czech Republic. http://hdl.handle.net/10138/565382

Šimko, J., Adigwe, A., Suni, A. and Vainio M. (2022). A Hierarchical Predictive Processing Approach to Modelling Prosody, Proc. 11th International Conference on Speech Prosody, Lisbon, Portugal. https://doi.org/10.21437/SpeechProsody.2022-86

Corpora

The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.

All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.

Search the Language Bank Portal: