Backing up and restoring CanDIG data
There are three kinds of data stored in CanDIG that we recommend backing up regularly.
- Clinical and Genomic metadata stored in CanDIGs’s postgres databases
- Authorization data stored in vault that details user’s authorization to access/edit ingested data
- Logs
For data types 1 and 2, we recommend taking back ups after each ingest event and to store one or more copies of your backups on a separate secure server from your CanDIG installation. We also recommend encrypting your backup so that it cannot be accessed by an unauthorizaed user.
Logs can be backed up on a regular schedule and at a minimum, should be saved elsewhere when performing a rebuild of the stack.
Backing up postgres databases
Both clinical and genomic metadata are stored within databases running in the postgres container postgres-db
.
The commands below assume that you are connected to the machine that is hosting the dockerized CanDIGv2 stack.
To backup the data stored in these databases:
- Open an interactive terminal inside the running postgres docker container with:
- Dump contents of the two databases to files.
-d
specifies the database to dump,-f
specifies the filename. Below we use the date and the name of the database being backed up:
You should then have two files, each with a complete copy of each of the databases.
You can now exit the container by entering
You should copy these to a secure location outside of the running container and consider encrypting them or otherwise ensuring that unauthorized users will not have access to the information. To copy from the container on to the docker host, you can use a command similar to:
Restoring postgres databases
To restore the databases that we have backed up, assuming you have the CanDIG stack up and running
- Stop the running katsu and htsget containers which are connected to the databases
- Then we need to copy the
sql
backup files into the running postgres container
Next we need to delete the initialized databases so we can replace them with the backed up versions.
- Open an interactive terminal to the postgres container
- Then connect to the psql commandline prompt with a database other than the ones we want to drop:
- Then drop the two existing databases, create empty replacement databases then quit the psql commandline prompt
- Load the backed up copies from file with these commands:
-
Exit the interactive terminal with the
exit
command. -
Restart the katsu and htsget services
You should be able to see the restored data in the data portal.
Backing up Secrets and Authorization data
Secrets and Authorization data in CanDIG are stored within Vault. These should be backed up regularly so that they can be restored should there be a system crash and before the CanDIG stack is rebuilt. To back up Vault, run the command:
This command creates a tar ball at tmp/vault/backup.tar.gz
. This should be saved into a secure location outside the server your CanDIG deployment is running. You may want to change the name of the backup to include the date and type of backup for future reference, e.g. YYYY-MM-DD-vault-backup.tar.gz
To restore the vault backup, copy the backup tarball into the vault directory in the CanDIG stack and rename it to restore.tar.gz
:
Then run
All previous secrets and authorizations should be restored to the stack. The tarball is renamed to restored.tar.gz
and can be deleted.
Backing up logs
Logs are stored in tmp/logs
. The contents of this folder should be saved periodically.