Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months
WP 1.2: Transcription Service for Finnish Interviews
Date of reporting: 2023-10
Report author: Martin Matthiesen (CSC)
Contributors: Anssi Moisio (Aalto), Sam Hardwick (CSC), Niko Partanen (National Library), Aivo Olev (Tallinn University of Technology)
Deliverable location: https://tekstiks.ee (Finnish)
The transcription service split into two parts: The end user frontend is hosted at the University of Tallinn, Estonia at https://tekstiks.ee and the speech recognition backend is hosted at CSC – IT Center for Science in Finland. For details and usage instructions see https://www.kielipankki.fi/arkisto/resource-info/tools-for-speech-analysis-and-annotation/
The source code is available on Github.
Olev, A; Alumäe, T. (2022). Estonian Speech Recognition and Transcription Editing Service. Baltic J. Modern Computing, Vol. 10 (3), pp. 409–421. DOI: 10.22364/bjmc.2022.10.3.14
Moisio, A; Porjazovski, D; Rouhe, A; Getman, Y; Virkkunen, A; AlGhezi, R; Lennes, M; Grósz, T; Lindén, K & Kurimo, M (2022). Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Language Resources and Evaluation. DOI: 10.1007/s10579-022-09606-3
Moisio, A. (2022). Lahjoita puhetta baseline Kaldi ASR model (1.2). Zenodo. DOI: 10.5281/zenodo.7101543