Troubleshooting the stack
Integration tests failing
Section titled “Integration tests failing”If any integration tests fail, it is usually best to go back to the first test that failed, as subesquent tests often rely on earlier tests passing to succeed. Therefore the root cause is usually with the first failure.
If test_tyk
fails, all other tests will fail because the stack relies on tyk being up and running as expected. If you see test_tyk
fail, you may as well ctrl+c
to stop the tests running and start investigating the issue with tyk. Some places to start looking for the issue are:
- is the tyk container up and running?
- check your docker desktop or
docker ps
to see if tyk and all other containers are running
- check your docker desktop or
- is there anything in the tyk docker logs that indicates an issue?
- view the logs with
docker logs candigv2_tyk_1
or in Docker desktop - if anything looks amiss, make a github issue with the details
- view the logs with
- did anything go wrong in the build logs?
- the build log can be found in
tmp/error.txt
- the build log can be found in
- is anything else in the central logs looking weird?
- the central logs for all docker containers can be found in
/tmp/logs/
, the most current one will be named withbuffer.<uuid>.log
whereas previous days’ logs are named.yyyymmdd_0.log
- the central logs for all docker containers can be found in
If any of the ingest_admin_*
tests fail, the later query tests will fail since they rely on having data ingested into the system to get the expected query results.
If any specific test is failing, looking at the individual container for the services that is failing, or the centralized log (/tmp/logs
) is usually the best place to start debugging. If at any stage you are unsure on where to start after integration test failures, please make a github issue.
If a failure relates to any of the services listed in the CANDIG_AUTH_MODULES
in the .env
file, and you find you need to rebuild any of these modules, you will need to rebuild all of these modules using make clean-authx
and make init-authx
.
Conda env not activated
Section titled “Conda env not activated”If you get an error when running a make command, something like:
bash: python: command not found
or an error message about dotenv
not being found.
Ensure the candig conda environment is activated in your terminal with conda activate candig
.
docker volumes not remade
Section titled “docker volumes not remade”If you get an error where after cleaning an individual service, when composing, it gets stuck at
waiting for x service to start ...
Use CTRL + c to exit the process then try running make docker-volumes
and then try composing again with make compose-<name of service>
No rule to make target
Section titled “No rule to make target”It is common to move around within the repo and not realise where you are. If you try to run a make command and get the error
make: *** No rule to make target `clean-candig-ingest'. Stop.
Check to make sure you are in the root of the CanDIGv2 repo as the commands only work while in the same directory as the Makefile.
If you are still having trouble, feel free to reach out to us on GitHub.
Common Errors
Section titled “Common Errors”Uncaught server error: java.lang.NullPointerException: Cannot invoke “String.equals(Object)” because “requestHost” is null
Section titled “Uncaught server error: java.lang.NullPointerException: Cannot invoke “String.equals(Object)” because “requestHost” is null”This error occurs when a service tries to access Keycloak through a URL that includes an underscore. This has been known to occur when setting up reverse proxies that attempt to redirect requests to Keycloak via the container name candigv2_keycloak_1
— instead it is much better to use its alias keycloak
.
Unauthorized {‘type’: ‘about:blank’, ‘title’: ‘Method Not Allowed’, ‘detail’: ‘Method Not Allowed’, ‘status’: 405}
Section titled “Unauthorized {‘type’: ‘about:blank’, ‘title’: ‘Method Not Allowed’, ‘detail’: ‘Method Not Allowed’, ‘status’: 405}”On federated systems, this may occur when the URL given to Federation contains a trailing '/'
in it. It is currently unknown why this occurs
Tyk provider issues
Section titled “Tyk provider issues”In the logs you are getting errors such as the below when trying to access any endpoints.
e.g.:
level=warning msg="JWT Invalid" api_id=91 api_name=federation error="Validation error. Validation error. The provider https://<$CANDIG_AUTH_DOMAIN>/auth/realms/candig does not have a client id matching any of the token audiences [https://<$CANDIG_AUTH_DOMAIN>/auth/realms/candig]" mw=OpenIDMW org_id= origin=10.9.234.195 path=/federation/v1/service-infotime="Apr 01 18:45:40" level=warning msg="Attempted access with invalid key." api_id=91 api_name=federation key="****JWT]" mw=OpenIDMW org_id= origin=10.9.234.195 path=/federation/v1/service-info
Check your tyk config files for anything that looks weird, e.g.
lib/tyk/tmp/apps/91.json
has the correct issuer and client_ids as configured in your .env
Should be something like:
"providers": [ { "issuer": "https://<$CANDIG_AUTH_DOMAIN>", "client_ids": { "<$KEYCLOAK_CLIENT_ID in base64 encoding>": "candig_policy" } } ]
For the client id, as an example, if you kept the default value for KEYCLOAK_CLIENT_ID
(local_candig
) in the example.env
, the value would be
echo -n "local_candig" | base64bG9jYWxfY2FuZGln
Check your .env
does not have any issues with parsing invisible white space or comments.
Tyk cannot find secret key
Section titled “Tyk cannot find secret key”Your stack doesn’t seem to be working and there are tyk related error messages such as Key not authorised
even though you believe you are using a valid token.
Double check your build log (tmp/progress.txt
) for messages such as:
cat: /opt/CanDIGv2/tmp/tyk/secret-key: No such file or directorymv: cannot stat 'tmp/secrets/tyk-secret-key': No such file or directorycat: /opt/CanDIGv2/tmp/tyk/secret-key: No such file or directory
To fix, try regenerating the tyk secret with make secret-tyk-secret-key
.
Then recompose both tyk and federation with:
make recompose-tykmake recompose-federation
Opa returns 401 Unauthorized
Section titled “Opa returns 401 Unauthorized”Sometimes 401 Unauthorized errors are caused by Opa not being able to find the data it needs to validate the bearer token. These will have a message of “request rejected by administrative policy” and will not leave any decisions in the log.
These are caused by Opa’s master system.authz policy rejecting access to any downstream policies, including all of CanDIG’s permission policies. Usually, this is happening because Opa can’t access its Vault service store or the IDP. Try re-running make compose-opa
to reconnect the Opa containers to Vault.
If you are still having trouble diagnosing the problem, you can temporarily set Opa’s system.authz (in permissions_engine/authz.rego
) authorization to allow all requests by default:
# Reject requests by defaultdefault allow := false # switch this to true
Then run make recompose-opa
. You can then access the endpoints /v1/data/idp
, /v1/data/vault
, and /v1/data/calculate
, which allows you to see more details about the internal logic of Opa’s decisionmaking.
Be sure to switch default allow
back to False when you’re done.
HTSGet not indexing files (or: genomics searching fails even though experiment metadata is loading)
Section titled “HTSGet not indexing files (or: genomics searching fails even though experiment metadata is loading)”Sometimes, if your variant files are located on a server that isn’t reachable by the host machine, ingest will silently fail, and the Candig-Ingest docker logs will show an error like the following:
level: WARNING, file: urllib3.connectionpool, log: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7eff58639910>: Failed to resolve 'ecs.uhn.ca' ([Errno -2] Name or service not known)")': /dhdp-ecsresgen?location=
As a convenience, you can add a specific hosts file entry by editing your .env
and replacing the lines:
# Adjust these two if you need to setup a custom mapping from s3 bucket to ip addressECS_DOMAIN=your.ecs.domainECS_IP_ADDR=127.0.0.1
If you need more than one hosts entry for this, you can edit the extra_hosts
entries in lib/htsget/docker-compose.yml
and lib/candig-ingest/docker-compose.yml
.