Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
---|---|---|---|---|---|---|---|---|
Loading... | ||||||||
Shortname | Name and metadata | License | Location | Cite | Resource group and help | Apply | Publication year | Support level |
These resource versions are not yet available in the Language Bank of Finland.
Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
---|---|---|---|---|---|---|---|---|
Loading... | ||||||||
Shortname | Name and metadata | License | Formats | Support level | Contact Person | Resource group and help | Location | Other information |
This corpus contains newspapers and magazines from Finland starting from 1770, compiled by the National Library of Finland. Further details of each version of the resource are maintained in the metadata record, findable via the persistent identifier (see the link at the resource title).
Based on the KLK data, word-level collections of uni-, bi- and trigrams have been created and are available for download. These are provided as a separate group of resources, The N-grams of the Newspaper and Periodical Corpus of the National Library of Finland.
The corpora consist mainly of digitized versions of texts originally printed on paper. These physical papers have been scanned, and optical character recognition (OCR) was performed on the resulting images. The digitized material spans a long period and contains different kinds of texts, writing styles and fonts. Scanning some parts of the material is more complex than scanning other parts, and the physical condition of the original texts also varies. The OCR techniques used have also varied, and there is the possibility that some of the texts have gone through manual post-correction. This results in some parts of the corpora being of terrible quality while others are of good quality. We have collected a list of publications related to OCR quality and collection processing:
This page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021092404
Last modified on 2025-03-12