Kielipankki – The Language Bank of Finland is a service for researchers using language resources. Linguists Niklas Edenmyr, Ali Basirat and Marc Tang from the University of Uppsala tell us about their research on the Language Bank resource Helsinki Corpus of Swahili 2.0 (HCS 2.0) Downloadable Annotated Version.
We are Niklas Edenmyr, Ali Basirat, and Marc Tang. We are linguists based at Uppsala University in Sweden. We respectively work on African linguistics, computational linguistics, and quantitative linguistic typology.
We are currently working together as a cooperation between two projects: Principal word embedding and Linguistic Diversity. The first project aims at testing and enhancing the power of word embedding with language data, whereas the second project investigates the cross-linguistic patterns of nominal classification systems (e.g. grammatical gender). Both projects are combining their efforts to scrutinize if the information extracted by word embedding can be helpful to the identification of grammatical gender in various languages of the world.
One of the languages we are currently working on is Swahili (Niger-Congo). Its nominal classification system is rather complex as it has more than 15 noun classes. We thus use the resource in the Language Bank Helsinki Corpus of Swahili 2.0 (HCS 2.0) Downloadable Annotated Version, which contains about 25 million annotated words, to train the word embedding models and test if the word vectors can be helpful for identifying noun classes in Swahili.
The FIN-CLARIN consortium consists of a group of Finnish universities along with CSC – IT Center for Science and the Institute for the Languages of Finland (Kotus). FIN-CLARIN helps the researchers in Finland to use, to refine, to preserve and to share their language resources. The Language Bank of Finland is the collection of services that provides the language materials and tools for the research community.
All previously published Language Bank researcher interviews are stored in the Researcher of the Month archive.