The Finnish Wikipedia 2017 source material corpus contains all Finnish articles from the online encyclopedia Wikipedia available in 1 January 2018. The text parts of the articles have been extracted from Wikipedia Dumps with WikiExtractor.
Latest versions/subcorpora: | |
Finnish Wikipedia 2017, source Metadata and license Attribution instructions |
Download the resource A copy of this version is available in the computing environment. |
Search for all versions in META-SHARE |
Of this language corpus different versions are (or might be in the future) published in the Language Bank of Finland. The versions are available through the Language Bank Download Service and/or through the Korp concordance tool, or they are offered by another member organisation of FIN-CLARIN. The links to the different versions can be found from the list above.
How to access a specific corpus in the Language Bank of Finland
Detailed information on the content of each version, user rights and licenses can be found from it’s specific metadata record in META-SHARE.
A version of this corpus is directly available in an uncompressed form in CSC’s computing environment. The data can be found in the directory /appl/data/kielipankki. You can open a connection to the environment by using an ssh application on your local machine, or via a browser interface. See further instructions on connecting to the computing environment (CSC).
Snapshot of the structure of the data on Puhti:
This resource group page has a Persistent Identifier: http://urn.fi/urn:nbn:fi:lb-2021091411