1 of 11

Guides

Various notes and guide

Provisioning remote servers

Infrastructure tools for the OpenHIM Platform

Deploying from your local environment to a remote server or cluster is easy. All you have to do is ensure the remote servers are setup as a Docker Swarm cluster. Then, from your local environment you may target a remote environment by using the `DOCKER_HOST` env var. e.g.

Setting up new servers

In addition, as part of the OpenHIM Platform Github repository we also provide scripts to easily setup new servers. The Terraform script are able to instantiate server in AWS and the Ansible script are able to configure those server to be ready to accept OpenHIM Platform packages.

Ansible

See .

It is used for:

Adding users to the remote servers
Provision of the remote servers in single and cluster mode: user and firewall configurations, docker installation, docker authentication and docker swarm provision.

All the passwords are saved securely using Keepass.

In the inventories, there is different environment configuration (development, production and staging) that contains: users and their ssh keys list, docker credentials and definition of the hosts.

Terraform

Is used to create and set AWS servers. See .

Ansible

A tool that enables infrastructure as code for provision of the servers.

Platform Deploy

Prerequisites

Linux OS to run commands
Install Ansible (as per https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html)
Ansible Docker Community Collection installed

ansible-galaxy collection install community.docker

Infrastructure and Servers

Please see the /inventories/{ENVIRONMENT}/hosts file for IP details of the designated servers. Set these to the server that you created via Terraform or to an on-premises server.

Ansible

SSH Access

To authenticate yourself on the remote servers your ssh key will need to be added to the sudoers var in the /inventories/{ENVIRONMENT}/group_vars/all.yml.

To have docker access you need to add your ssh key to the docker_users var in the /inventories/{ENVIRONMENT}/group_vars/all.yml file.

An authorised user will need to run the provision_servers.yml playbook to add the SSH key of the person who will run the Ansible scripts to the servers.

Configuration

Before running the ansible script add the server to your known_hosts file else ansible will throw an error, for each server run:

ssh-keyscan -H <host> >> ~/.ssh/known_hosts

To run a playbook you can use:

ansible-playbook \
  --ask-vault-pass \
  --become \
  --inventory=inventories/<INVENTORY> \
  --user=ubuntu \
  playbooks/<PLAYBOOK>.yml

Alternatively, to run all provisioning playbooks with the development inventory (most common for setting up a dev server), use:

ansible-playbook \
  --ask-vault-pass \
  --become \
  --inventory=inventories/development \
  --user=ubuntu \
  playbooks/provision.yml

Vault

The vault password required for running the playbooks can be found in the database.kdbx KeePass file.

To encrypt a new secret with the Ansible vault run:

echo -n '<YOUR SECRET>' | ansible-vault encrypt_string

The New password is the original Ansible Vault password.

Keepass

Copies of all the passwords used here are kept in the encrypted database.kdbx file.

Please ask your admin for the decryption password of the database.kdbx file.

Terraform

A tool that enables infrastructure as code to set up servers in AWS EC2.

Cloud Dev environments

To set up a developer's development environment in AWS, run this terraform project. The scripts will allow the joining of an existing VPC, the creation of a public subnet and a variable number of EC2 instances that the user will have SSH access to. Alarms have been created in the scripts which will auto-shutdown the instances after a configurable period, based on CPU metrics. A Lambda scheduled event can also be configured which can run at a regular interval to shut down any instances that may still be running.

Pre-requisites

Creating a VPC

This should only be done once per AWS account as there is a limit of 5 per region. Please check if this has already been run and use the existing VPC_ID and SUBNET_ID for the following section if it does and skips to the next section.

Navigate to the infrastructure/terraform/vpc directory

Initialize Terraform project:

Execute the following:

Copy the output for the next step, e.g for ICAP this has already been run and this is the result:

Creating EC2 instances

Navigate to the infrastructure/terraform directory

Initialize Terraform project:

The following properties have to be set:

The configuration can be done using an terraform variable file. Create a file called my.tfvars. Below is an example that illustrates the structure of the environment variables file. This example is of a configuration that you can use for the ICAP CDR. Please replace {user} with your own user.

The AWS account to be used is defined in the ~/.aws/credentials file. If you don't have file this make sure you have configured the AWS CLI.

The sample file above has access to 3 accounts and the options for <account_name> could be "default", "jembi-sandbox", "jembi-icap"

Optionally, add ACCOUNT = "<account_name>" to my.tfvars if you want to use something other than default.

The flag for specifying an environment variables file is -var-file, create the AWS stack by running:

Once the script has run successfully, the ip addresses and domains for the servers will be displayed:

SSH access should be now available - use the default 'ubuntu' user - ssh ubuntu@<ip_address>

Destroying the AWS stack - run:

Resource Allocations

Allot CPU and RAM resources to services, per service, per server.

What it Means

CPU

CPU allocations are specified as a portion of the total number of cores on the host system, i.e., a CPU limit of 2 in a 6-core system is an effective limit of 33.33% of the CPU, and a CPU limit of 6 in a 6-core system is an effective limit of 100% of the CPU.

RAM

Memory allocations are specified as a number followed by their multiplier, i.e., 500M, 1G, 10G, etc.

Defaults

As a default, each package contained in Platform is allocated a maximum of 3 GB of RAM, and 100% CPU usage.

Allocating Resources per Package

The resource allocation can be set on a per-package basis, as specified by the relevant environment variables found in the relevant .

Notes

Be wary of allocating CPU limits to ELK Stack services. These seem to fail with CPU limits and their already implemented health checks.
Take note to not allocate less memory to ELK Stack services than their JVM heap sizes.
Exit code 137 indicates an out-of-memory failure. When running into this, it means that the service has been allocated too little memory.

Disaster Recovery Process

Backup & restore process.

Two major procedures should exist in order to recover lost data:

Creating backups continuously
Restoring the backups

This includes the different databases: MongoDB, PostgreSQL DB and Elasticsearch.

The current implementation will create continuous backups for MongoDB (to backup all the transactions of OpenHIM) and PostgreSQL (to backup the HAPI FHIR data) as follows:

Daily backups (for 7 days rotation)
Weekly backups (for 4 weeks rotation)
Monthly backups (for 3 months rotation)

More details on each service backup & restore pages.

Elasticsearch

Elasticsearch Backup & Restore.

Elasticsearch Backups

For detailed steps about creating backups see: .

Elasticsearch offers the functionality to save a backup in different ways, for further understanding, you can use this link: .

Elasticsearch Restore

To see how to restore snapshots in Elasticsearch: .

HAPI FHIR Data

FHIR messages Backup & Restore.

Validated messages from HAPI FHIR will be stored in PostgreSQL database.

The following content will detail the backup and restore process of this data.

Backups

This section assumes Postgres backups are made using pg_basebackup

Postgres (Hapi-FHIR)

To start up HAPI FHIR and ensure that the backups can be made, ensure that you have created the HAPI FHIR bind mount directory (eg./backup)

Disaster Recovery

NB! DO NOT UNTAR OR EDIT THE FILE PERMISSIONS OF THE POSTGRES BACKUP FILE

Postgres (HAPI FHIR)

Preliminary steps:

Do a destroy of fhir-datastore-hapi-fhir using the CLI binary (./platform-linux for linux)
Make sure the Postgres volumes on nodes other than the swarm leader have been removed as well! You will need to ssh into each server and manually remove them.
Do an init of fhir-datastore-hapi-fhir using the CLI binary

After running the preliminary steps, run the following commands on the node hosting the Postgres leader:

NOTE: The value of the REPMGR_PRIMARY_HOST variable in your .env file indicates the Postgres leader

Retrieve the Postgres leader's container-ID using: docker ps -a. Hereafter called postgres_leader_container_id
Run the following command: docker exec -t <postgres_leader_container_id> pg_ctl stop -D /bitnami/postgresql/data
Wait for the Postgres leader container to die and start up again. You can monitor this using: docker ps -a
Run the following command: docker rm <postgres_leader_container_id>
Retrieve the new Postgres leader's container-ID using docker ps -a, be weary to not use the old postgres_leader_container_id
Retrieve the Postgres backup file's name as an absolute path (/backups/postgresql_xxx). Hereafter called backup_file
Run the following commands in the order listed :
Do a down of fhir-datastore-hapi-fhir using the CLI binary Example: ./instant-linux package down -n=fhir-datastore-hapi-fhir --env-file=.env.*
Wait for the down operation to complete
Do an init of fhir-datastore-hapi-fhir using the CLI binary Example: ./instant-linux package init -n=fhir-datastore-hapi-fhir --env-file=.env.*

Postgres should now be recovered

Note: After performing the data recovery, it is possible to get an error from HAPI FHIR (500 internal server error) while the data is still being replicated across the cluster. Wait a minute and try again.

OpenHIM Data

OpenHIM backup & restore

OpenHIM transaction logs and other data is stored in the Mongo database. Restoring this data means restoring all the history of transactions which mandatory to recover in case something unexpected happened and we lost all the data.

In the following sections, we will cover:

Already implemented jobs to create backups periodically
How to restore the backups

Backup & Restore

Single node

Single node restore docs

The following job may be used to set up a backup job for a single node Mongo:

[job-run "mongo-backup"]
schedule= @every 24h
image= mongo:4.2
network= mongo_backup
volume= /backups:/tmp/backups
command= sh -c 'mongodump --uri=${OPENHIM_MONGO_URL} --gzip --archive=/tmp/backups/mongodump_$(date +%s).gz'
delete= true

Cluster

Cluster restore docs

The following job may be used to set up a backup job for clustered Mongo:

[job-run "mongo-backup"]
schedule= @every 24h
image= mongo:4.2
network= mongo_backup
volume= /backups:/tmp/backups
command= sh -c 'mongodump --uri=${OPENHIM_MONGO_URL} --gzip --archive=/tmp/backups/mongodump_$(date +%s).gz'
delete= true

Restore

In order to restore from a backup you would need to launch a Mongo container with access to the backup file and the mongo_backup network by running the following command:

docker run -d --network=mongo_backup --mount type=bind,source=/backups,target=/backups mongo:4.2

Then exec into the container and run mongorestore:

mongorestore --uri="mongodb://mongo-1:27017,mongo-2:27017,mongo-3:27017/openhim?replicaSet=mongo-set" --gzip --archive=/backups/<NAME_OF_BACKUP_FILE>

The data should be restored.

Development

Adding Packages

The Go Cli runs all services from the jembi/platform docker image. When adding new packages or updating existing packages to Platform you will need to build/update your local jembi/platform image. .
As you add new packages to the platform remember to list them in the config.yml file - otherwise the added package will not be detected by the .

Config Importing

This section defines the configuration importing methods used in the Platform

Overview

Certain packages in the Platform require configuration to enable their intended functionality in a stack. For instance, the OpenHIM package requires the setting of users, channels, roles, and so on. Other packages, such as JS Report or Kibana, require importing of pre-configured dashboards stored in compressed files.

Most services in the Platform can be configured by sending a request containing the required configuration files to the relevant service API. To achieve this, the Platform leverages a helper container to make that API call.

If a package uses a config importer, its configuration can be found in the relevant package's importer section.

The Helper Container

The Process

As part of the package-launching process, the to-be-configured service is deployed, then awaits configuring. Before the configuration can take place, the relevant service is waited upon for joining to the Docker internal network. Once the service has joined the network, the helper container is launched and makes the API request to configure the service.

Images

jembi/api-config-importer

For reference on how to use the jembi/api-config-importer image, see the repo here.

jembi/instantohie-config-importer

For reference on how to use the jembi/instantohie-config-importer image, see the repo here.

HAPI FHIR Data

FHIR messages Backup & Restore.

Validated messages from HAPI FHIR will be stored in PostgreSQL database.

The following content will detail the backup and restore process of this data.

Backups

This section assumes Postgres backups are made using pg_basebackup

Postgres (Hapi-FHIR)

To start up HAPI FHIR and ensure that the backups can be made, ensure that you have created the HAPI FHIR bind mount directory (eg./backup)

Disaster Recovery

NB! DO NOT UNTAR OR EDIT THE FILE PERMISSIONS OF THE POSTGRES BACKUP FILE

Postgres (HAPI FHIR)

Preliminary steps:

Do a destroy of fhir-datastore-hapi-fhir using the CLI binary (./platform-linux for linux)
Make sure the Postgres volumes on nodes other than the swarm leader have been removed as well! You will need to ssh into each server and manually remove them.
Do an init of fhir-datastore-hapi-fhir using the CLI binary

After running the preliminary steps, run the following commands on the node hosting the Postgres leader:

NOTE: The value of the REPMGR_PRIMARY_HOST variable in your .env file indicates the Postgres leader

Retrieve the Postgres leader's container-ID using: docker ps -a. Hereafter called postgres_leader_container_id
Run the following command: docker exec -t <postgres_leader_container_id> pg_ctl stop -D /bitnami/postgresql/data
Wait for the Postgres leader container to die and start up again. You can monitor this using: docker ps -a
Run the following command: docker rm <postgres_leader_container_id>
Retrieve the new Postgres leader's container-ID using docker ps -a, be weary to not use the old postgres_leader_container_id
Retrieve the Postgres backup file's name as an absolute path (/backups/postgresql_xxx). Hereafter called backup_file

Run the following commands in the order listed :

# Stop the server running in the container
docker exec -t <postgres_leader_container_id> pg_ctl stop -D /bitnami/postgresql/data

# Clear the contents of /bitnami/postgresql/data
docker exec -t --user root <postgres_leader_container_id> sh -c 'cd /bitnami/postgresql/data && rm -rf $(ls)'

# Copy over the base.tar file
sudo docker cp <backup_file>/base.tar <postgres_leader_container_id>:/bitnami/postgresql

# Extract the base.tar file
docker exec -t --user root <postgres_leader_container_id> sh -c 'tar -xf /bitnami/postgresql/base.tar --directory=/bitnami/postgresql/data'

# Copy over the pg_wal.tar file
sudo docker cp <backup_file>/pg_wal.tar <postgres_leader_container_id>:/bitnami/postgresql

# Extract pg_wal.tar
docker exec -t --user root <postgres_leader_container_id> sh -c 'tar -xf /bitnami/postgresql/pg_wal.tar --directory=/bitnami/postgresql/data/pg_wal'

# Copy conf dir over
docker exec -t --user root <postgres_leader_container_id> sh -c 'cp -r /bitnami/postgresql/conf/. /bitnami/postgresql/data'

# Set pg_wal.tar permissions
docker exec -t --user root <postgres_leader_container_id> sh -c 'cd /bitnami/postgresql/data/pg_wal && chown -v 1001 $(ls)'

# Start the server
docker exec -t <postgres_leader_container_id> pg_ctl start -D /bitnami/postgresql/data

Do a down of fhir-datastore-hapi-fhir using the CLI binary Example: ./instant-linux package down -n=fhir-datastore-hapi-fhir --env-file=.env.*
Wait for the down operation to complete
Do an init of fhir-datastore-hapi-fhir using the CLI binary Example: ./instant-linux package init -n=fhir-datastore-hapi-fhir --env-file=.env.*

Postgres should now be recovered

Note: After performing the data recovery, it is possible to get an error from HAPI FHIR (500 internal server error) while the data is still being replicated across the cluster. Wait a minute and try again.

Terraform

A tool that enables infrastructure as code to set up servers in AWS EC2.

Cloud Dev environments

Pre-requisites

Creating a VPC

Navigate to the infrastructure/terraform/vpc directory

Initialize Terraform project:

Execute the following:

Copy the output for the next step, e.g for ICAP this has already been run and this is the result:

Creating EC2 instances

Navigate to the infrastructure/terraform directory

Initialize Terraform project:

terraform init

The following properties have to be set:

PUBLIC_KEY_PATH - path to the user's public key file that gets injected into the servers created
PROJECT_NAME    - unique project name that is used to identify each VPC and its resources
HOSTED_ZONE_ID  - (only if you are creating domains, which by default you are) the hosted zone to use, this must be created in the AWS console
DOMAIN_NAME     - the base domain name to use
SUBNET_ID       - the subnet id to use, copy this from the previous step
VPC_ID          - the subnet id to use, copy this from the previous step

PUBLIC_KEY_PATH = "/home/{user}/.ssh/id_rsa.pub"
PROJECT_NAME = "jembi_platform_dev_{user}"
HOSTED_ZONE_ID = "Z00782582NSP6D0VHBCMI"
DOMAIN_NAME = "{user}.jembi.cloud"
SUBNET_ID = "subnet-0004b0dacb5862d59"
VPC_ID = "vpc-067ab69f374ac9f47"

The AWS account to be used is defined in the ~/.aws/credentials file. If you don't have file this make sure you have configured the AWS CLI.

cat ~/.aws/credentials

[default]
aws_access_key_id = AKIA6FOPGN5TYHXXXXX
aws_secret_access_key = Qf7E+qcXXXXXXQh4XznN4MM8qR/VP/SXgXXXXX
[jembi-sandbox]
aws_access_key_id = AKIASOHFAV527JCXXXXX
aws_secret_access_key = YXFu3XxXXXXXTeNXdUtIg0gb9Ro7gJ89XXXXX
[jembi-icap]
aws_access_key_id = AKIAVFN7GJJFS6LXXXXX
aws_secret_access_key = b2I6jhwXXXXX4YehBCx/7rKl1JZjYdbtXXXXX

The sample file above has access to 3 accounts and the options for <account_name> could be "default", "jembi-sandbox", "jembi-icap"

Optionally, add ACCOUNT = "<account_name>" to my.tfvars if you want to use something other than default.

The flag for specifying an environment variables file is -var-file, create the AWS stack by running:

terraform apply -var-file my.tfvars

Once the script has run successfully, the ip addresses and domains for the servers will be displayed:

Apply complete! Resources: 13 added, 0 changed, 0 destroyed.

Outputs:

domains = {
  "domain_name" = "{user}.jembi.cloud"
  "node_domain_names" = [
    "node-0.{user}.jembi.cloud",
    "node-1.{user}.jembi.cloud",
    "node-2.{user}.jembi.cloud",
  ]
  "subdomain" = [
    "*.{user}.jembi.cloud",
  ]
}
public_ips = [
  "13.245.143.121",
  "13.246.39.101",
  "13.246.39.92",
]

SSH access should be now available - use the default 'ubuntu' user - ssh ubuntu@<ip_address>

Destroying the AWS stack - run:

terraform destroy -var-file my.tfvars