Present: Satu Saalasti, Mietta Lennes (HY), Harri Hirvonsalo (CSC, via Zoom), João da Silva (CSC), Martin Matthiesen (CSC, notes)
Time, place: 30.10.2018 12-15:30 CSC, Espoo
In this meeting we tried to look at the issue from all known angles. This memo tries to identify the main issues and state of the discussion
The data will consist of video recordings of 7-10 year old children. The children will have speech impairments and will be recorded in 3 situations:
In all stages the children will be uttering preselected utterances.
In addition to videos showing the face of the children there will be also ultra sound recordings of tongue movements. The data will be used to assess the effectiveness of the treament method.
Data re-use can happen for two main reasons:
While the goal is to allow for both it might be easier to define a process to allow the reproducibilty of existing research.
Decisions
Before data can be collected, Satu needs to get approval from an ethics committee. Since the final implementation is not known the ethics committee will not be able to decide on the appropriateness. Without a decision data cannot be collected.
Our approach needs to be specific enough to get preliminary approval, but general enough to keep flexibility in the implementation.
Decisions
We decided to try to attempt to get preliminary approval for the data management by describing the planned system and requesting approval to use it provided we implemented it as planned.
We discussed whether the system should be able to offer sensitive metadata under certain conditions. Descriptive metadata as shown in B2SHARE or META-SHARE should be public. Sensitive descriptions of the dataset can be moved to be part of the dataset itself. Rationale: Users must be able to search the metadata, to assess whether the dataset is useful for their needs. Such descriptive metadata should never need sensitive information.
Decisions
Harri showed B2SHARE, EUDAT’s self-depositing repository. The existing B2SHARE instance will not be used for this pilot, but the underlying software (with modifications) will. Sensitive data can be processed in two places, CSC’s ePouta (a secure cloud) and at TSD in Oslo/Norway (also a secure cloud). TSD offers also the download of data, ePouta does not. B2SHARE supports OAI-PMH, so metadata export is possible.
Martin showed an Example from the Language Bank (ELFA) and the application and approval process in the REMS2 bases Language Bank Rights (LBR). So far the Language Bank has a simple approval process, a good reason to use the restricted resources is enough. Data is either shown in Korp and/or made available in the Download service. The Lanugage Bank supports OAI-PMH export, but not yet import.
Decisions
No decisions on the level of integration of the solution with the Language Bank. Because of the nature of the data, integration can be low.
We discussed the final approval process, assuming that we have all the other parts in place. Who would approve access? Satu? The ethics committee? A research group?
While LBR can accomodate more complex processes, it was unclear what this process would look like.
Decisions
Satu discusses this issue within her research group.
If a user has the right to process the data there are two basic use cases Both approaches have advantages and disadvantages, summarized below.
At the moment CSC’s ePouta offers Remote Desktop access and TSD offers shell access and download (and possibly Remote Desktop as well, Harri will check). TSD is not a real option at this point, since There are no plans to store the sensitve data outside of Finland.
Decisions
We keep both options in mind for now. If we need to prioritze, we prioritize towards the first opion, ”ePouta/Remote Desktop”.