Collection of corpora from The University of Helsinki Language Corpus Server (UHLCS)

The University of Helsinki Language Corpus Server (UHLCS) was a multilingual data bank founded in the late 1980s. The UHLCS collection includes text corpora of more than 50 languages, including minority languages and various text types. There are also tools specifically developed for analyzing the UHLCS corpora. The use of most corpora is restricted for research and teaching. Read more…

Subcorpora:
Chuvash Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
English Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Corpus of Erzya and Moksha Mordvin Literature and Journals and Komi Zyrian Literature (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Erzya and Moksha Mordvin Word List Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Estonian Corpus 1 (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Estonian Corpus 2 (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Finnish Corpus (Bibles) (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Finnish Corpus (Literature) (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
The Helsinki Korp Version of the Finland-Swedish Text Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Korp
The Finland-Swedish Text Corpus (UHLCS), source Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Ingrian Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Komi Zyrian Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Latin Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Lude (Ludian) Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Nenets Corpus (Tundra Nenets) (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
North Saami Corpus (Literature) (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
North Saami Corpus (Sámikultuvradoaibmagotti smiehttamush) (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Quantifiers and Quantification in Finnish and Languages Spoken in the Central Volga–Kama Region (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Somali Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
The Susanne Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Ume Saami Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Uralic, Turkic, Indo-Iranian and Mongol languages; languages of Siberia and Caucasia (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Uzbek-English Dictionary (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti
Lists of Words Corpus (UHLCS) Metadata and license Attribution instructions	Apply for access rights Access the corpus in Puhti

Corpus contents

The University of Helsinki Language Corpus Server (UHLCS) is a multilingual data bank founded in the late 1980s and maintained by the Department of General Linguistics at the University of Helsinki until September 2007. When the old server was taken out of use, the UHLCS corpora were moved to servers maintained by CSC – IT Center for Science, and the corpora were made available via the Language Bank of Finland.

At present, the UHLCS collection includes text corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types. There are also tools specifically developed for analyzing the UHLCS corpora.

The use of most corpora is restricted for research and teaching. Resource-specific information and license conditions can be found in the metadata record of the corpus in question.

In 2000, the corpora from the Uralic, Turkic, Tungusic, Mongolic, Chukotko-Kamchatkan, Iranian and North-East Caucasian languages were edited for public use with the financial support of the Max Planck Institute for Evolutionary Anthropology, Leipzig. In summer 2003, the basis for the metadata descriptions of the corpora were prepared with the financial support of the ECHO project (ECHO = European Cultural Inheritance Online).

Last updated: 28.2.2024

This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2023030901

Search the Language Bank Portal:

Researcher of the Month: Sofoklis Kakouros

Contact

The Language Bank's technical support:
kielipankki (at) csc.fi
tel. +358 9 4572001

Requests related to language resources:
fin-clarin (at) helsinki.fi
tel. +358 29 4129317

More contact information

Collection of corpora from The University of Helsinki Language Corpus Server (UHLCS)

Corpus contents

News

Contact