Project: FIN-CLARIAH
Grant agreement: Academy of Finland no. 345610
Start date: 01-01-2022
Duration: 24 months
WP 2.1: Report on Licensing agreements for special categories of personal data
Date of reporting: 2023-06
Report author: Mietta Lennes (UHEL)
Contributors: Sirpa Kovanen, Krister Lindén (UHEL)
Deliverable location: Deposition license agreement template
The deposition license agreement template of the Language Bank of Finland allows for the deposition of resources that contain personal data (cf. D2.1.1: Licensing agreements for personal data). In addition, some research datasets may also include personal data belonging to special categories. Such data reveals the person’s racial or ethnic origin, political opinions. religion or philosophical beliefs, trade union membership, data concerning health, sexual orientation or activity, or genetic and biometric data for identifying the person.
Personal data belonging to special categories are considered sensitive. In some cases, it is not possible to completely remove the sensitive data without making the entire resource unusable regarding the research purpose. However, it may still be possible to deposit the resource (or some version of it) with the Language Bank, given that sufficient and proportionate safety measures are applied.
Before the resource can be deposited, the data controller regarding the original purpose of use (in practice, usually, the depositing researchers themselves) must conduct a preliminary risk assessment and a Data Protection Impact Assessment (DPIA) if appropriate. In this process, the researchers should primarily follow the instructions of their home organization. For convenience, the Language Bank also provides an instruction page for the preliminary evaluation of data protection.
Before depositing, the researchers are responsible for minimizing the amount of personal data, and especially the sensitive information, to the extent that is possible and proportionate with regard to the research purpose. In order to maintain the deposited content accessible and useful for other researchers, some documentation of the pseudonymization process can be included in the metadata of the resource.
For resources containing personal data, the resource-specific data protection terms and conditions and the description of the categories of personal data in the resource are included in an annex of the deposition license agreement with the Language Bank. In the same annex, it is possible for the data controller to specify further requirements, in case the processing of personal data contained in the resource is seen to involve risks that call for a particularly high level of information security.
Currently, the Language Bank offers the following protective measures that can be applied on sensitive datasets:
The Language Bank is also collaborating with the DELAD Task Force in CLARIN. DELAD focusses on sharing corpora of disordered speech that often contain, e.g., health-related data and data from children.
Last updated: 2023-06-06