<< List of all deliverables
FIN-CLARIAH D4.1.4: R/Python module
Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months
WP 4.1: Report on R/Python module
Date of reporting: 22-11-2023
Report author: Julia Matveeva (University of Turku), Leo Lahti (University of Turku)
Contributors: Pyry Kantanen (University of Turku), Akewak Jeba (University of Turku)
Deliverable location: https://github.com/fennicahub/fennica
Description
-
Python module: We have developed a Python script utilizing Pandas, designed to selectively extract MARC fields from the raw data. This script allows for the extraction of fields individually or in batches, which are then saved in CSV format. The Python module is available at the following URL: https://github.com/fennicahub/fennica/tree/master/inst/examples/field_picking.
- R module, known as the Fennica-R package, functions as an algorithmic toolkit designed explicitly for transparent quantitative analysis of the Finnish national bibliography, Fennica, and its metadata. Initially deployed to harmonize a subset of 70,000 entries, the module has recently undergone updates to facilitate the analysis of a more extensive dataset, now encompassing 1 million entries, including a subset for the period 1809-1917. The CSV files generated by the Python module are instrumental in further harmonization processes via the Fennica package.
The Fennica-R package is publicly accessible at https://github.com/fennicahub/fennica. See the package README for an up-to-date link to outputs generated by the package.