Status: Approved
Revision 8560 approved 29.3.2019 by Urpo Kaila, CSC Head of Security. (Subsequent changes are only cosmetic, only major changes will require a new review.)
Last updated: 21.8.2023
This document describes how encryption is used at the Language Bank of Finland. The intended audience are administrators and users of the Language Bank that have a need to encrypt data to secure it from unauthorized view.
Encryption has been around for a long time and the basics of secure encryption are well understood. Less well understood is the implications of long term encryption for archival purposes. Official guidelines [1] are often vague about what tool to use and how to handle key management. This document attempts to cover the whole lifecycle of the encryption and decryption process of data.
In this document we assume that both sender and recipient of the encrypted data have safeguards in place that the data is only accessible in decrypted form to authorized personnel. We also assume that there are physical and organizational safeguards in place, such as secured computing environments, backups and anti virus software. The scope of this document is to describe the secure storage and delivery of encrypted sensitive data between sender (here the Language Bank of Finland) and recipient (an authorized researcher). The authorization process of the researcher is not part of this document.
Long term encryption requires a considerable administrative overhead to safely work. The challenges are not so much technical than organizational. So only encrypt if you cannot avoid it by other means, for example access restricted storage. Some data, such as data containing sensitive personal information will likely needed to be encrypted, but copyright protected data might not have such high security requirements.
We assume that the encrypted data needs to be available for a long time (10+ years) and shareable among authorized users.
There are 2 basic encryption methods available:
Symmetric encryption is straight forward: Data is encrypted using a password and the password is shared with the users that need to decrypt the data. The data is decrypted with the same password. The drawback of this method is that it does not scale well. The more users, the more likely it is that the password spreads to unintended audiences. Transmitting the password to the intended user is also difficult, since it has to happen via an encrypted and secured channel itself.
Asymetric encryption does not have that drawback. Data is encrypted with the public keys of the authorized users, they decrypt it with their personal private keys. Deauthorization happens by removing a public key from the encrypted data by re-encryption with authorized public keys. Public keys can be shared with anyone, since they can only encrypt, but not decrypt. Only the private keys need extra protection.
The usage scenario above demands public key cryptography. This cryptography is sufficiently secure using the following software and parameters. The software is open source and widely used, the algorthims are also well-understood by the cryptographic community and the key lengths are an adequate compromise between security and performance.
The suitability of the software and the algorithms needs to be checked regularily, once per year.
The key pairs used to access the data need to be
The private keys need to be well protected, to make sure only the rightful owner has access to them.
Long term preserved encrypted data needs
Authorization is checked once a year and no longer valid keys are removed from the data. This requires re-encryption.
This section assumes that the Language Bank (”data provider”) has identified the recipient and the recipient is authorized to receive a copy of the encrypted data. In this case the data is re-encrypted with the recipient’s verified public key and signed with the Language Bank’s authorized person’s private key. This ensures the integrity of the sent data.
The recipent must agree to policies defining security standards on the recipient’s end, for example where the unencrypted data is processed, how long it can be retained, etc.
Protecting against intentional misuse of encrypted and potentially sensitive data is difficult and most measures can be circumvented. Elaborate tracing methods like watermarking can make it harder for a malicious user to spread the data without being caught as the source, but it also increases complexity in the distribution process and potentially decreases the scientific value of the data. The prevention of unauthorized copying of the decrypted data at the recipiend’s end is not scope of this document.
[1] A collection of cyber security guidelines at the Finnish Communications Regulatory Authority (FICORA, in Finnish): https://www.viestintavirasto.fi/kyberturvallisuus/ncsa-fi.html
[2] 6 Techniques For Creating Strong Passwords: https://www.lifewire.com/8-character-password-2180969