Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Rosa González Hautamäki tells us about her research on within-speaker variation and the effects of voice modifications. The AVOID corpus, which she collected in collaboration with the Computational Speech group at UEF, is a valuable resource for studying human-induced voice modifications.
I am Rosa González Hautamäki, a postdoctoral researcher at the Research Unit of Logopedics (RULOGO) at the University of Oulu, and a visiting researcher at the School of Humanities at the University of Eastern Finland. I hold a Ph.D. in Computer Science and maintain ongoing collaborations with the School of Computing at the University of Eastern Finland and the Human Language Technology lab at the National University of Singapore (NUS).
My research focuses on within-speaker variation in the context of speaker recognition. Speech is a complex signal that varies due to several factors, such as age, health, emotional state, and more, so it is expected that a speaker won’t utter the same phrase in exactly the same way multiple times. During my doctoral studies, I studied the effects of voice modifications on the performance of voice comparisons carried out by listeners or automatic systems. My initial research focused on mimicry and voice disguise, considering that some speakers may not be cooperative when interacting with speaker recognition systems. Our research showed that even simple techniques to disguise one’s voice could cause degradation in the performance of automatic systems, while also making the task of speaker comparison challenging for listeners.
Since then, my studies on within-speaker variation have focused on identifying the factors that impact the performance of speaker verification, including deliberate and non-deliberate voice modifications. These findings have also been important in analyzing speech in other speech technology tasks, such as speech spoofing attacks and auditory speech perception. Exploring the factors that impact system decisions can help in making them more reliable.
Currently, my research on speech analysis involves using machine learning models with data from evaluations used to identify developmental language disorders in children. I am excited to be part of a motivated group of researchers who are exploring speech and interventions that can support those working with the development of children’s speech.
During my doctoral research, I collaborated with the Computational Speech group at the University of Eastern Finland to collect a dataset for the study of voice disguise. Kielipankki provided crucial support by offering information necessary for the collection and preparation of the corpus, as well as for its publication as a resource. The resulting dataset, called the Age-related Voice Disguise (AVOID) corpus, contains voice recordings of Finnish speakers in their modal voice and attempting age disguise.
In one study, we used the AVOID corpus to analyze the impact of changes in selected acoustical features on automatic speaker recognition systems, and found that the difference in long-term fundamental frequency (F0) was the most detrimental factor to speaker recognition performance, even when the automatic system uses spectral features.
In another study using the AVOID corpus, we evaluated the effectiveness of age stereotypes as a voice disguise strategy in speaker comparisons. Listeners estimated both the speaker’s chronological and intended age (attempting child and elderly voices), and results showed that the age estimations for intended voices for female speakers were more accurate towards the target age, while for male speakers, age estimations corresponded to the direction of the target voice only for elderly voices.
Overall, the AVOID corpus is a valuable resource for studying human-induced voice modifications and we expect further research would help make systems more robust to disguised voices.
González Hautamäki, R., Hautamäki, V., and Kinnunen, T. (2019). ”On Limits of Automatic Speaker Verification: Explaining Degraded Recognizer Score Through Acoustic Changes Resulting from Voice Disguise”, The Journal of the Acoustic Society of America 146, 693. https://doi.org/10.1121/1.5119240
González Hautamäki,R., Sahidullah, Md., Hautamäki, V., and Kinnunen,T. (2017). ”Acoustical and perceptual study of voice disguise by age modification in speaker verification”, Speech Communication, Volume 95, Pages 1-15, https://doi.org/10.1016/j.specom.2017.10.002
González Hautamäki, R., Sahidullah, Md., Kinnunen, T., and Hautamäki, V (2016). ”Age-Related Voice Disguise and its Impact in Speaker Verification Accuracy”, Odyssey: The Speaker and Language Recognition Workshop, Bilbao, Spain, pages 277-282, http://dx.doi.org/10.21437/Odyssey.2016-40
González Hautamäki, R., Kanervisto, A., Hautamäki, V., and Kinnunen, T. (2018). ”Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification”, Odyssey: The Speaker and Language Recognition Workshop, Les Sables d’Olonne, France, pages 320-326, http://dx.doi.org/10.21437/Odyssey.2018-45
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, refine, preserve and share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive. This article is also published on the website of the Faculty of Arts of the University of Helsinki.