Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months
WP 5.1: Report on Log Data Collection and Analysis
Date of reporting: 05-06-2023
Report authors: Sanna Kumpulainen, Jaakko Peltonen, Farid Alijani (Tampere University)
Contributors: Sanna Kumpulainen, Jaakko Peltonen, Farid Alijani, Anna Sendra Toset (Tampere University)
Deliverable location: GitHub repository
In general the goal of WP5.1 is to design and develop methods that enable analysis of log data from systems in the FIN-CLARIAH infrastructure and are usable for compatible other systems. The analysis of log data can serve purposes such as monitoring use of the systems and for recommendation of content to end-users.
As one of the deliverables and initial attempts, we conducted a comprehensive study on the utility of the log data to investigate the feasibility of developing both user-based and item-based recommender systems which could be potentially deployed for end-users in the future.
Secondly, as a proof of concept we have developed a collaborative recommender system to assist information retrieval in digital libraries, based on log data gathered from use of the libraries. The developed recommender system combines collaborative and content-based recommendation. It has been initially developed with similarity search approaches, and is extensible to various inference schemes including neural approaches in future work.
In the proof of concept recommender system, we are currently using the National Library of Finland (NLF) dataset (digi.kansalliskirjasto.fi), including metadata of the collection, description, preservation and accessibility of Finland’s printed national heritage as digitized materials. The proof of concept is easily extensible to comparable log files of other digital libraries, and similar approaches can be applied to other DARIAH-FI collections. We have an open access GitHub repository for the public use which has been primarily tailored to the SLURM clusters, provided by CSC infrastructures for data storage and massive computational resources.