Backing up and restoring CanDIG data
There are three kinds of data stored in CanDIG that we recommend backing up regularly.
- Clinical and Genomic metadata stored in CanDIGs’s postgres databases
- Authorization data stored in vault that details user’s authorization to access/edit ingested data
- Logs
For data types 1 and 2, we recommend taking back ups after each ingest event and to store one or more copies of your backups on a separate secure server from your CanDIG installation. We also recommend encrypting your backup so that it cannot be accessed by an unauthorizaed user.
Logs can be backed up on a regular schedule and at a minimum, should be saved elsewhere when performing a rebuild of the stack.
Backing up postgres databases
Both clinical and genomic metadata are stored within databases running in the postgres container postgres-db
.
The commands below assume that you are connected to the machine that is hosting the dockerized CanDIGv2 stack.
To backup the data stored in these databases:
- Open an interactive terminal inside the running postgres docker container with:
docker exec -it candigv2_postgres-db_1 bash
- Dump contents of the two databases to files.
-d
specifies the database to dump,-f
specifies the filename. Below we use the date and the name of the database being backed up:
pg_dump -U admin -d genomic -f yyyy-mm-dd-genomic-backup.sqlpg_dump -U admin -d clinical -f yyyy-mm-dd-clinical-backup.sql
You should then have two files, each with a complete copy of each of the databases.
You can now exit the container by entering
exit
You should copy these to a secure location outside of the running container and consider encrypting them or otherwise ensuring that unauthorized users will not have access to the information. To copy from the container on to the docker host, you can use a command similar to:
docker cp candigv2_postgres-db_1:yyyy-mm-dd-genomic-backup.sql /desired/path/targetdocker cp candigv2_postgres-db_1:yyyy-mm-dd-clinical-backup.sql /desired/path/target
Restoring postgres databases
To restore the databases that we have backed up, assuming you have the CanDIG stack up and running
- Stop the running katsu and htsget containers which are connected to the databases
docker stop candigv2_katsu_1docker stop candigv2_htsget_1
- Then we need to copy the
sql
backup files into the running postgres container
docker cp /path/to/backup/yyyy-mm-dd-genomic-backup.sql candigv2_postgres-db_1:/yyyy-mm-dd-genomic-backup.sqldocker cp /path/to/backup/yyyy-mm-dd-clinical-backup.sql candigv2_postgres-db_1:/yyyy-mm-dd-clinical-backup.sql
Next we need to delete the initialized databases so we can replace them with the backed up versions.
- Open an interactive terminal to the postgres container
docker exec -it candigv2_postgres-db_1 bash
- Then connect to the psql commandline prompt with a database other than the ones we want to drop:
psql -U admin -d template1
- Then drop the two existing databases, create empty replacement databases then quit the psql commandline prompt
DROP DATABASE clinical;CREATE DATABASE clinical;DROP DATABASE genomic;CREATE DATABASE genomic;\q
- Load the backed up copies from file with these commands:
psql -U admin -d clinical < yyyy-mm-dd-clinical-backup.sqlpsql -U admin -d genomic < yyyy-mm-dd-genomic-backup.sql
-
Exit the interactive terminal with the
exit
command. -
Restart the katsu and htsget services
docker start candigv2_katsu_1docker start candigv2_htsget_1
You should be able to see the restored data in the data portal.
Backing up Secrets and Authorization data
Secrets and Authorization data in CanDIG are stored within Vault. These should be backed up regularly so that they can be restored should there be a system crash and before the CanDIG stack is rebuilt. To back up Vault, run the command:
make backup-vault
This command creates a tar ball at tmp/vault/backup.tar.gz
. This should be saved into a secure location outside the server your CanDIG deployment is running. You may want to change the name of the backup to include the date and type of backup for future reference, e.g. YYYY-MM-DD-vault-backup.tar.gz
To restore the vault backup, copy the backup tarball into the vault directory in the CanDIG stack and rename it to restore.tar.gz
:
cp /path/to/backup.tar.gz path/to/CanDIGv2/lib/vault/restore.tar.gz
Then run
make restore-vault
All previous secrets and authorizations should be restored to the stack. The tarball is renamed to restored.tar.gz
and can be deleted.
Backing up logs
Logs are stored in tmp/logs
. The contents of this folder should be saved periodically.