Production deployment
Apart from the basic steps in the Local Deployment Guide to get the CanDIG stack up and running, there are additional settings and security recommendations that need to set up in a production level environment. We provide the following as general advice, but it is important for all CanDIG deployers to also consult with their institutional infrastructure security personnel to ensure that their deployment meets the necessary level of data security.
Stable branch
Production deployments should use the latest stable release of CanDIGv2 which uses the stable branches and fixed versions of all other submodules and packages. The develop versions of CanDIG software are under active development and should not be used for production purposes. When new stable releases are made, we recommend updating as soon as possible. It is possible that CanDIG nodes running different stable releases will not be able to be federated.
Reverse Proxy & Firewall
It is essential to setup a reverse proxy and firewall so that only specific ports are open to the internet. The software used for this is up to the deployer and is considered outside of the CanDIG stack.
Essentially, the only two ports that should be available to the outside world are Tyk (default 5080) and Keycloak (default 8080). Usually we configure a reverse proxy so that both are on separate domains, such that e.g. https://candig.uhnresearch.ca directs to Tyk and https://candigauth.uhnresearch.ca directs to Keycloak.
Some specific examples of how existing institutes have approached this are below.
HAProxy - UHN & BCGSC
At UHN, the candig.uhnresearch.ca domain is under a proxy, so requests to a specific service go through the following stack:
Specifically, the UHN proxy forwards all candig.uhnresearch.ca and candigauth.uhnresearch.ca requests (port 443) to candig1:5080 (tyk) and candig1:8080 (keycloak) respectively, thereby acting as a firewall. All CanDIGv2 microservices can only be accessed through Tyk.
OpenStack security group & nginx - C3G
An OpenStack security group is applied as a firewall that allows ingress traffic to ports 80 and 443 only.
nginx acts as a reverse proxy which:
- Re-routes http traffic to https
- Provides SSL certificates
- Routes
${CANDIG_DOMAIN}
and${CANDIG_AUTH_DOMAIN}
http[s] traffic from outside to the appropriate microservice (tyk or keycloak respectively) and port.
Virtual Machine behind Virtual Private Network
Any user that can access the VM where the CanDIG stack is running can access potentially private data. Users that have access to this VM should be strictly controlled to those users who are authorized to see any data that is ingested into the stack. One option is to use a VPN to ensure only those with access to the VPN can access the running VM. This strategy is currently being used at UHN and BCGSC.
.env settings
The following default settings in the .env
file should be changed when deploying CanDIG in a production environment. You can also take a look at the example-production.env
and etc/env/production-diff-template.diff
files with the changes below incorporated.
value in prod environment | Notes |
---|---|
CANDIG_DOMAIN=<your.prod.domain> | Update to correct prod domain |
CANDIG_INTERNAL_DOMAIN=${CANDIG_DOMAIN} or internal ip address | Some sites have needed to change this to 127.0.0.1 |
CANDIG_AUTH_DOMAIN=<your.prod.auth.domain> | Update to correct prod auth domain |
CANDIG_SITE_LOCATION=<your-site-location> | e.g. UHN, BC-Dev The name of your site, should be unique within a federation. |
CANDIG_DEBUG_MODE=0 | Turn off DEBUG mode |
CANDIG_PRODUCTION_MODE=1 | Turn on Production mode |
LOCAL_IP_ADDR=192.168.x.x | May need to set to your public IP, not always needed |
FEDERATION_SELF_SERVER_ID=<unique-node-name> | e.g. UHN-prod, BCGSC-prod. Uniquely identifies your node within the CanDIG federation, can be ${CANDIG_SITE_LOCATION} |
FEDERATION_SELF_SERVER - update province, province-code see section below | Ensures site displays properly on the map and can be federated |
KEYCLOAK_CLIENT_ID=<your-keycloak-identifier> | Optional to change this, more for consistency than necessity |
KEYCLOAK_PUBLIC_PROTO=https | change to https for prod |
KEYCLOAK_PUBLIC_URL=${KEYCLOAK_PUBLIC_PROTO}://${CANDIG_AUTH_DOMAIN} | Keycloak public url shouldn’t have port |
KEYCLOAK_PRIVATE_URL=${KEYCLOAK_PRIVATE_PROTO}://keycloak:${KEYCLOAK_PORT} | Keycloak private url should have keycloak |
KEYCLOAK_PROXY_HEADERS=xforwarded OR forwarded | Needs to be set to be consistent with your reverse proxy configuration, see Keycloak docs for more info |
TYK_LOGIN_TARGET_URL=https://${CANDIG_DOMAIN} | ensure tyk uses https and remove port |
TYK_ANALYTICS_FROM_EMAIL=YOUR-ADMIN-EMAIL@email.ca | Update to a relevant email addresss |
TYK_USE_SSL=true | ensure tyk uses SSL |
CANDIG_DATA_PORTAL_URL=https://${CANDIG_DOMAIN}:${CANDIG_DATA_PORTAL_PORT}/data-portal | ensure dataportal url has https |
Setting location information
You will need to modify the FEDERATION_SELF_SERVER
entry to reflect your site’s specific settings. Set CANDIG_SITE_LOCATION
to the name of your site, such as UHN, BCGSC, or C3G. For federation settings, set the name, province, and province-code for the FEDERATION_SELF_SERVER
variable in the .env
. See table below for codes for each Canadian province and territory:
Province/Territory | province | province-code |
---|---|---|
Alberta | AB | ca-ab |
British Columbia | BC | ca-bc |
Manitoba | MB | ca-mb |
New Brunswick | NB | ca-nb |
Newfoundland and Labrador | NL | ca-nl |
Northwest Territories | NT | ca-nt |
Nova Scotia | NS | ca-ns |
Nunavut | NU | ca-nu |
Ontario | ON | ca-on |
Prince Edward Island | PE | ca-pe |
Quebec | QC | ca-qc |
Saskatchewan | SK | ca-sk |
Yukon | YT | ca-yt |
Example values from UHN which is located in Ontario:
Setting Site Logo
To customize the site logo, you need to place your image in the candig-data-portal either before building or within the container after running the build-all or install-all commands. The image should be located at CanDIGv2/lib/candig-data-portal/candig-data-portal/src/assets/images/users/siteLogo.png
. This will overwrite the default logo.
File requirements:
- Name the file siteLogo.png
- The image should be square and will be set to 34x34 pixels
- The image format must be PNG
If the portal is already running, copy the logo into the Docker container using this command:
Otherwise:
Change the default site admin
When CanDIG is initially deployed, a site_admin
user will be created by default. The username and password for this user can be found in the env.sh
file. It is important to change this default to a real user who should have site administration privileges.
-
Login to the data portal with the credentials you wish to make a site administrator to ensure the user can login successfully
-
ssh into the VM running your CanDIG deployment and cd into the currently deployed repo directory
-
Get a site admin token using the default site admin user:
- Set the role of the real user to a site admin with the following curl command:
- Check the role assignment was successful by verifying the following command returns
True
:
- Delete the default site admin user using your new real user site admin token
Comment out or remove the value of DEFAULT_SITE_ADMIN_USER in your .env file:
Run python settings.py; source env.sh
again to reset your environment variables.
Test that the default user has been removed successfully: Remove the cached refresh token:
Run python site_admin_token.py
. You should be prompted for your actual site admin username and password.
Keep the site admin user and password secure at all times.
Adding a site curator
See User Roles
Connecting Keycloak to institutional LDAP
You will need to work with your site IT administrator in order to connect an external authentication service to the running Keycloak.
Federating with other CanDIG production instances
To federate your own node with another CanDIG node, follow the instructions in the federation-service README.
Federation is a two way process, where you need to register another server with your node, and the other node needs to register your node, by exchanging valid site administration bearer tokens.
Once two nodes are federated, summary data from federated nodes will appear in both nodes’ data portals and will be viewable by all users who are able to login.
Access to patient level data through specific program authorization is managed by the node that hosts the data for that program. For example, if a user from UHN needs to be given authorization to a program hosted within the BC node, a site administrator, site curator or program curator (with authorization for that program) from BC will need to add that UHN user as a team member to that program within the BC CanDIG node. Alternately, if the UHN user was granted access through a DAC process,
Backing up production data
It is not expected that a CanDIG instance would hold the only copy of any ingested data. However, recognising that the ETL and ingest process takes significant time and effort, it is a good idea to regularly backup all data stored in CanDIG. Steps for how to do this can be found in Backing up and restoring CanDIG data