Time and place: 28.3.2018 13-14.30, CSC
Present: Satu Saalasti, Mietta Lennes (HY), Martin Matthiesen (CSC)
We discussed a possible use case of sensitive data to be shared. We looked at the issue from various angles:
Satu proposed to gather verbal motoric data from children age 3-7 with and without speech impairments (the latter serving as control). Permission to distribute will be requested from the children’s guardians.
The gathering with happen first under a controlled environment, the following data will be gathered:
The following derived data will be created
The children will be asked to repeat a set of pre-defindes words and utterances a few times.
We discussed several options as to what to distribute. It is possible to distribute only the (anonymised) transskripts.
We decided to concentrate on the distribution for the full dataset (audio, video, sensor data, aligned transcriptions). Reasons:
The data will be stored and distributed via the Language Bank of Finland’s Download service (https://korp.csc.fi/download). Access permissions will be handled using the Language Bank Rights system.
As to how the data can be distributed safely we had several ideas, one was to package the data into a VeraCrypt container and distribute the password separately. We also discussed DRM techniques that would make it maybe possible to withdraw access at any give time.
We will also need to look at the application process via LBR/REMS:
The data usage should be secure but also easy at the user’s end. Too complicated usage conditions will lead to the user copying the data away from the secured container. While this breach would not be our responsibility it should be minimized
Satu: Planning the data collection
Martin/Mietta: Planning data distribution: DRM, VeraCrypt, credential distribution
All(later): Looking at the application process via LBR/REMS: