The Newspaper and Periodical Corpus of the National Library of Finland, Kielipankki Version

Suomeksi


Currently available versions of this resource

ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
Loading...
ShortnameName and metadataLicenseLocationCiteResource group and helpApplyPublication yearSupport level
Showing 0 to 0 of 0 entries

Upcoming versions of this resource

These resource versions are not yet available in the Language Bank of Finland.

ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
Loading...
ShortnameName and metadataLicenseFormatsSupport levelContact PersonResource group and helpLocationOther information
Showing 0 to 0 of 0 entries

Resource information

This corpus contains newspapers and magazines from Finland starting from 1770, compiled by the National Library of Finland. Further details of each version of the resource are maintained in the metadata record, findable via the persistent identifier (see the link at the resource title).

Contents

N-grams (separate resource)

Important notes

  • Previously, the Finnish acronym for the corpora The Newspaper and Periodical OCR Corpus of the National Library of Finland used to be ”Digilib”. Currently, the acronym ”klk” and the short names klk-fi-1874-dl and klk-fi-1920-dl are recommended instead.

License and access

  • Some versions of this resource are available publicly (PUB), whereas others may require you to log in as an academic user (ACA) or to apply for individual access rights (RES). Click on the license image to see the resource-specific license text.

Examples of use (Korp versions)

 

Concordance view of any form of the word 'sosialismi' in the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp
Concordance view of any form of the word ’sosialismi’ in the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp

 

Word picture of the word 'sosialismi' in klk-fi-v2-korp
Word picture of the word ’sosialismi’ in the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp

 

Trend diagram of all forms of the word 'sosialismi' occurring in klk-fi-v2-korp
Trend diagram of all forms of the word ’sosialismi’ occurring in the Finnish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland version 2, Korp

OCR quality

The corpora consist mainly of digitized versions of texts originally printed on paper. These physical papers have been scanned, and optical character recognition (OCR) was performed on the resulting images. The digitized material spans a long period and contains different kinds of texts, writing styles and fonts. Scanning some parts of the material is more complex than scanning other parts, and the physical condition of the original texts also varies. The OCR techniques used have also varied, and there is the possibility that some of the texts have gone through manual post-correction. This results in some parts of the corpora being of terrible quality while others are of good quality. We have collected a list of publications related to OCR quality and collection processing:

 


This page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021092404

 

Last modified on 2025-03-12