arrow-left

All pages
gitbookPowered by GitBook
1 of 12

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

OpenHIM Data

OpenHIM backup & restore

OpenHIM transaction logs and other data is stored in the Mongo database. Restoring this data means restoring all the history of transactions which mandatory to recover in case something unexpected happened and we lost all the data.

In the following sections, we will cover:

  • Already implemented jobs to create backups periodically

  • How to restore the backups

hashtag
Backup & Restore

hashtag
Single node

The following job may be used to set up a backup job for a single node Mongo:

hashtag
Cluster

The following job may be used to set up a backup job for clustered Mongo:

hashtag
Restore

In order to restore from a backup you would need to launch a Mongo container with access to the backup file and the mongo_backup network by running the following command:

docker run -d --network=mongo_backup --mount type=bind,source=/backups,target=/backups mongo:4.2

Then exec into the container and run mongorestore:

mongorestore --uri="mongodb://mongo-1:27017,mongo-2:27017,mongo-3:27017/openhim?replicaSet=mongo-set" --gzip --archive=/backups/<NAME_OF_BACKUP_FILE>

The data should be restored.

Single node restore docsarrow-up-right
Cluster restore docsarrow-up-right
[job-run "mongo-backup"]
schedule= @every 24h
image= mongo:4.2
network= mongo_backup
volume= /backups:/tmp/backups
command= sh -c 'mongodump --uri=${OPENHIM_MONGO_URL} --gzip --archive=/tmp/backups/mongodump_$(date +%s).gz'
delete= true
[job-run "mongo-backup"]
schedule= @every 24h
image= mongo:4.2
network= mongo_backup
volume= /backups:/tmp/backups
command= sh -c 'mongodump --uri=${OPENHIM_MONGO_URL} --gzip --archive=/tmp/backups/mongodump_$(date +%s).gz'
delete= true

Resource Allocations

Allot CPU and RAM resources to services, per service, per server.

hashtag
What it Means

hashtag
CPU

CPU allocations are specified as a portion of the total number of cores on the host system, i.e., a CPU limit of 2 in a 6-core system is an effective limit of 33.33% of the CPU, and a CPU limit of 6 in a 6-core system is an effective limit of 100% of the CPU.

hashtag
RAM

Memory allocations are specified as a number followed by their multiplier, i.e., 500M, 1G, 10G, etc.

hashtag
Defaults

As a default, each package contained in Platform is allocated a maximum of 3 GB of RAM, and 100% CPU usage.

hashtag
Allocating Resources per Package

The resource allocation can be set on a per-package basis, as specified by the relevant environment variables found in the relevant .

hashtag
Notes

  • Be wary of allocating CPU limits to ELK Stack services. These seem to fail with CPU limits and their already implemented health checks.

  • Take note to not allocate less memory to ELK Stack services than their JVM heap sizes.

  • Exit code 137 indicates an out-of-memory failure. When running into this, it means that the service has been allocated too little memory.

Packages section

Terraform

A tool that enables infrastructure as code to set up servers in AWS EC2.

hashtag
Cloud Dev environments

To set up a developer's development environment in AWS, run this terraform project. The scripts will allow the joining of an existing VPC, the creation of a public subnet and a variable number of EC2 instances that the user will have SSH access to. Alarms have been created in the scripts which will auto-shutdown the instances after a configurable period, based on CPU metrics. A Lambda scheduled event can also be configured which can run at a regular interval to shut down any instances that may still be running.

Pre-requisites

hashtag
Creating a VPC

This should only be done once per AWS account as there is a limit of 5 per region. Please check if this has already been run and use the existing VPC_ID and SUBNET_ID for the following section if it does and skips to the next section.

Navigate to the infrastructure/terraform/vpc directory

Initialize Terraform project:

Execute the following:

Copy the output for the next step, e.g for ICAP this has already been run and this is the result:

hashtag
Creating EC2 instances

Navigate to the infrastructure/terraform directory

Initialize Terraform project:

The following properties have to be set:

The configuration can be done using an terraform variable file. Create a file called my.tfvars. Below is an example that illustrates the structure of the environment variables file. This example is of a configuration that you can use for the ICAP CDR. Please replace {user} with your own user.

The AWS account to be used is defined in the ~/.aws/credentials file. If you don't have file this make sure you have configured the AWS CLI.

The sample file above has access to 3 accounts and the options for <account_name> could be "default", "jembi-sandbox", "jembi-icap"

Optionally, add ACCOUNT = "<account_name>" to my.tfvars if you want to use something other than default.

The flag for specifying an environment variables file is -var-file, create the AWS stack by running:

Once the script has run successfully, the ip addresses and domains for the servers will be displayed:

SSH access should be now available - use the default 'ubuntu' user - ssh ubuntu@<ip_address>

Destroying the AWS stack - run:

Disaster Recovery Process

Backup & restore process.

Two major procedures should exist in order to recover lost data:

  • Creating backups continuously

  • Restoring the backups

This includes the different databases: MongoDB, PostgreSQL DB and Elasticsearch.

The current implementation will create continuous backups for MongoDB (to backup all the transactions of OpenHIM) and PostgreSQL (to backup the HAPI FHIR data) as follows:
  • Daily backups (for 7 days rotation)

  • Weekly backups (for 4 weeks rotation)

  • Monthly backups (for 3 months rotation)

More details on each service backup & restore pages.

Install AWS CLIarrow-up-right
Install Terraformarrow-up-right
terraform init
terraform apply
Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

SUBNET_ID = "subnet-0004b0dacb5862d59"
VPC_ID = "vpc-067ab69f374ac9f47"
terraform init
PUBLIC_KEY_PATH - path to the user's public key file that gets injected into the servers created
PROJECT_NAME    - unique project name that is used to identify each VPC and its resources
HOSTED_ZONE_ID  - (only if you are creating domains, which by default you are) the hosted zone to use, this must be created in the AWS console
DOMAIN_NAME     - the base domain name to use
SUBNET_ID       - the subnet id to use, copy this from the previous step
VPC_ID          - the subnet id to use, copy this from the previous step
PUBLIC_KEY_PATH = "/home/{user}/.ssh/id_rsa.pub"
PROJECT_NAME = "jembi_platform_dev_{user}"
HOSTED_ZONE_ID = "Z00782582NSP6D0VHBCMI"
DOMAIN_NAME = "{user}.jembi.cloud"
SUBNET_ID = "subnet-0004b0dacb5862d59"
VPC_ID = "vpc-067ab69f374ac9f47"
cat ~/.aws/credentials
[default]
aws_access_key_id = AKIA6FOPGN5TYHXXXXX
aws_secret_access_key = Qf7E+qcXXXXXXQh4XznN4MM8qR/VP/SXgXXXXX
[jembi-sandbox]
aws_access_key_id = AKIASOHFAV527JCXXXXX
aws_secret_access_key = YXFu3XxXXXXXTeNXdUtIg0gb9Ro7gJ89XXXXX
[jembi-icap]
aws_access_key_id = AKIAVFN7GJJFS6LXXXXX
aws_secret_access_key = b2I6jhwXXXXX4YehBCx/7rKl1JZjYdbtXXXXX
terraform apply -var-file my.tfvars
Apply complete! Resources: 13 added, 0 changed, 0 destroyed.

Outputs:

domains = {
  "domain_name" = "{user}.jembi.cloud"
  "node_domain_names" = [
    "node-0.{user}.jembi.cloud",
    "node-1.{user}.jembi.cloud",
    "node-2.{user}.jembi.cloud",
  ]
  "subdomain" = [
    "*.{user}.jembi.cloud",
  ]
}
public_ips = [
  "13.245.143.121",
  "13.246.39.101",
  "13.246.39.92",
]
terraform destroy -var-file my.tfvars

Ansible

A tool that enables infrastructure as code for provision of the servers.

hashtag
Platform Deploy

hashtag
Prerequisites

  • Linux OS to run commands

  • Install Ansible (as per )

  • Ansible Docker Community Collection installed

hashtag
Infrastructure and Servers

Please see the /inventories/{ENVIRONMENT}/hosts file for IP details of the designated servers. Set these to the server that you created via Terraform or to an on-premises server.

hashtag
Ansible

hashtag
SSH Access

To authenticate yourself on the remote servers your ssh key will need to be added to the sudoers var in the /inventories/{ENVIRONMENT}/group_vars/all.yml.

To have docker access you need to add your ssh key to the docker_users var in the /inventories/{ENVIRONMENT}/group_vars/all.yml file.

An authorised user will need to run the provision_servers.yml playbook to add the SSH key of the person who will run the Ansible scripts to the servers.

hashtag
Configuration

Before running the ansible script add the server to your known_hosts file else ansible will throw an error, for each server run:

To run a playbook you can use:

Alternatively, to run all provisioning playbooks with the development inventory (most common for setting up a dev server), use:

hashtag
Vault

The vault password required for running the playbooks can be found in the database.kdbx KeePass file.

To encrypt a new secret with the Ansible vault run:

The New password is the original Ansible Vault password.

hashtag
Keepass

Copies of all the passwords used here are kept in the encrypted database.kdbx file.

circle-info

Please ask your admin for the decryption password of the database.kdbx file.

HAPI FHIR Data

FHIR messages Backup & Restore.

Validated messages from HAPI FHIR will be stored in PostgreSQL database.

The following content will detail the backup and restore process of this data.

hashtag
Backups

This section assumes Postgres backups are made using pg_basebackup

Guides

Various notes and guide

https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.htmlarrow-up-right
ansible-galaxy collection install community.docker
ssh-keyscan -H <host> >> ~/.ssh/known_hosts
ansible-playbook \
  --ask-vault-pass \
  --become \
  --inventory=inventories/<INVENTORY> \
  --user=ubuntu \
  playbooks/<PLAYBOOK>.yml
ansible-playbook \
  --ask-vault-pass \
  --become \
  --inventory=inventories/development \
  --user=ubuntu \
  playbooks/provision.yml
echo -n '<YOUR SECRET>' | ansible-vault encrypt_string

hashtag
Postgres (Hapi-FHIR)

To start up HAPI FHIR and ensure that the backups can be made, ensure that you have created the HAPI FHIR bind mount directory (eg./backup)

hashtag
Disaster Recovery

NB! DO NOT UNTAR OR EDIT THE FILE PERMISSIONS OF THE POSTGRES BACKUP FILE

hashtag
Postgres (HAPI FHIR)

Preliminary steps:

  1. Do a destroy of fhir-datastore-hapi-fhir using the CLI binary (./platform-linux for linux)

  2. Make sure the Postgres volumes on nodes other than the swarm leader have been removed as well! You will need to ssh into each server and manually remove them.

  3. Do an init of fhir-datastore-hapi-fhir using the CLI binary

After running the preliminary steps, run the following commands on the node hosting the Postgres leader:

NOTE: The value of the REPMGR_PRIMARY_HOST variable in your .env file indicates the Postgres leader

  1. Retrieve the Postgres leader's container-ID using: docker ps -a. Hereafter called postgres_leader_container_id

  2. Run the following command: docker exec -t <postgres_leader_container_id> pg_ctl stop -D /bitnami/postgresql/data

  3. Wait for the Postgres leader container to die and start up again. You can monitor this using: docker ps -a

  4. Run the following command: docker rm <postgres_leader_container_id>

  5. Retrieve the new Postgres leader's container-ID using docker ps -a, be weary to not use the old postgres_leader_container_id

  6. Retrieve the Postgres backup file's name as an absolute path (/backups/postgresql_xxx). Hereafter called backup_file

  7. Run the following commands in the order listed :

  8. Do a down of fhir-datastore-hapi-fhir using the CLI binary Example: ./instant-linux package down -n=fhir-datastore-hapi-fhir --env-file=.env.*

  9. Wait for the down operation to complete

  10. Do an init of fhir-datastore-hapi-fhir using the CLI binary Example: ./instant-linux package init -n=fhir-datastore-hapi-fhir --env-file=.env.*

Postgres should now be recovered

Note: After performing the data recovery, it is possible to get an error from HAPI FHIR (500 internal server error) while the data is still being replicated across the cluster. Wait a minute and try again.

Elasticsearch

Elasticsearch Backup & Restore.

hashtag
Elasticsearch Backups

For detailed steps about creating backups see: .

Elasticsearch offers the functionality to save a backup in different ways, for further understanding, you can use this link: .

# Stop the server running in the container
docker exec -t <postgres_leader_container_id> pg_ctl stop -D /bitnami/postgresql/data

# Clear the contents of /bitnami/postgresql/data
docker exec -t --user root <postgres_leader_container_id> sh -c 'cd /bitnami/postgresql/data && rm -rf $(ls)'

# Copy over the base.tar file
sudo docker cp <backup_file>/base.tar <postgres_leader_container_id>:/bitnami/postgresql

# Extract the base.tar file
docker exec -t --user root <postgres_leader_container_id> sh -c 'tar -xf /bitnami/postgresql/base.tar --directory=/bitnami/postgresql/data'

# Copy over the pg_wal.tar file
sudo docker cp <backup_file>/pg_wal.tar <postgres_leader_container_id>:/bitnami/postgresql

# Extract pg_wal.tar
docker exec -t --user root <postgres_leader_container_id> sh -c 'tar -xf /bitnami/postgresql/pg_wal.tar --directory=/bitnami/postgresql/data/pg_wal'

# Copy conf dir over
docker exec -t --user root <postgres_leader_container_id> sh -c 'cp -r /bitnami/postgresql/conf/. /bitnami/postgresql/data'

# Set pg_wal.tar permissions
docker exec -t --user root <postgres_leader_container_id> sh -c 'cd /bitnami/postgresql/data/pg_wal && chown -v 1001 $(ls)'

# Start the server
docker exec -t <postgres_leader_container_id> pg_ctl start -D /bitnami/postgresql/data
hashtag
Elasticsearch Restore

To see how to restore snapshots in Elasticsearch: .

Snapshot filesystem repository docsarrow-up-right
Register a snapshot repository docsarrow-up-right
Snapshot Restore docsarrow-up-right

Config Importing

This section defines the configuration importing methods used in the Platform

hashtag
Overview

Certain packages in the Platform require configuration to enable their intended functionality in a stack. For instance, the OpenHIM package requires the setting of users, channels, roles, and so on. Other packages, such as JS Report or Kibana, require importing of pre-configured dashboards stored in compressed files.

Most services in the Platform can be configured by sending a request containing the required configuration files to the relevant service API. To achieve this, the Platform leverages a helper container to make that API call.

circle-info

If a package uses a config importer, its configuration can be found in the relevant package's importer section.

hashtag
The Helper Container

hashtag
The Process

As part of the package-launching process, the to-be-configured service is deployed, then awaits configuring. Before the configuration can take place, the relevant service is waited upon for joining to the Docker internal network. Once the service has joined the network, the helper container is launched and makes the API request to configure the service.

hashtag
Images

jembi/api-config-importer

For reference on how to use the jembi/api-config-importer image, see the repo .

jembi/instantohie-config-importer

For reference on how to use the jembi/instantohie-config-importer image, see the repo .

herearrow-up-right
herearrow-up-right

Provisioning remote servers

Infrastructure tools for the OpenHIM Platform

Deploying from your local environment to a remote server or cluster is easy. All you have to do is ensure the remote servers are setup as a Docker Swarm cluster. Then, from your local environment you may target a remote environment by using the `DOCKER_HOST` env var. e.g.

DOCKER_HOST=ssh://ubuntu@<ip> instant package init ...

hashtag
Setting up new servers

In addition, as part of the OpenHIM Platform Github repository we also provide scripts to easily setup new servers. The Terraform script are able to instantiate server in AWS and the Ansible script are able to configure those server to be ready to accept OpenHIM Platform packages.

hashtag
Ansible

See .

It is used for:

  • Adding users to the remote servers

  • Provision of the remote servers in single and cluster mode: user and firewall configurations, docker installation, docker authentication and docker swarm provision.

All the passwords are saved securely using Keepass.

In the inventories, there is different environment configuration (development, production and staging) that contains: users and their ssh keys list, docker credentials and definition of the hosts.

hashtag
Terraform

Is used to create and set AWS servers. See .

herearrow-up-right
herearrow-up-right

Performance Testing

The performance scripts are located in the testarrow-up-right folder. To run this script against a local or remote server.

hashtag
Steps

  1. Make sure you have the necessary dependencies installed, more importantly, the k6 binary. Refer to this documentation Building a k6 binaryarrow-up-right

  2. Set the [BASE_URL] variable to the URL of your server. By default, it is set to "http://localhost:5001", but you can change it to the appropriate URL.

  3. If there are any additional dependencies or configurations required by the [generateBundle] function or any other imported modules, make sure those are set up correctly.

  4. Open your terminal or command prompt and navigate to the directory where the scripts are located, e.g.

  5. Run the script using the k6 run command followed by the filename. In this case, you would run [k6 run load.js]

  6. The script will start executing and sending HTTP POST requests to the specified server. The requests will be sent at a constant arrival rate defined in the [options] object

  7. The script includes some thresholds defined in the [options] object. These thresholds define the performance criteria for the script. If any of the thresholds are exceeded, the script will report a failure.

  8. Monitor the output in the terminal to see the results of the script execution. It will display information such as the number of virtual users (VUs), request statistics, and any failures that occurred.

  9. To visualize the output in grafana, run the k6 scripts with the following environment variables and flag set K6_PROMETHEUS_RW_SERVER_URL=http://localhost:9090/api/v1/write && ./k6 run -o experimental-prometheus-rw script.js

hashtag
Sample load test result

The test results were obtained from running on Ubuntu 22.04 OS, 64GB RAM and 12 Cores. ✓ status code is 200

Metric
Value

hashtag
Sample volume test results

Metric
Value

avg=619.01ms min=421.78ms med=621.54ms max=812.9ms p(90)=692.07ms p(95)=711.18ms

http_req_failed

0.00% ✓ 0 ✗ 188

http_req_receiving

avg=115.87µs min=60.86µs med=110.01µs max=508.35µs p(90)=152.09µs p(95)=158.61µs

http_req_sending

avg=125.31µs min=63.72µs med=114.43µs max=825.81µs p(90)=150.33µs p(95)=191.61µs

http_req_tls_handshaking

avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s

http_req_waiting

avg=618.77ms min=421.58ms med=621.32ms max=812.7ms p(90)=691.81ms p(95)=710.93ms

http_reqs

188 3.106853/s

iteration_duration

avg=625.32ms min=427.15ms med=628.41ms max=818.77ms p(90)=698.25ms p(95)=717.76ms

iterations

188 3.106853/s

vus

2 min=1 max=2

vus_max

2 min=2 max=2

avg=2.49s min=478.77ms med=2.5s max=3.22s p(90)=2.7s p(95)=2.79s

http_req_failed

0.00% ✓ 0 ✗ 954

http_req_receiving

avg=105.4µs min=51.79µs med=103.68µs max=473.23µs p(90)=129.93µs p(95)=140.63µs

http_req_sending

avg=130.4µs min=60.02µs med=110.82µs max=2.72ms p(90)=152.04µs p(95)=225.79µs

http_req_tls_handshaking

avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s

http_req_waiting

avg=2.49s min=478.52ms med=2.5s max=3.22s p(90)=2.7s p(95)=2.79s

http_reqs

954 3.160536/s

iteration_duration

avg=2.5s min=483.16ms med=2.5s max=3.23s p(90)=2.7s p(95)=2.79s

iterations

954 3.160536/s

vus

4 min=4 max=8

vus_max

8 min=7 max=8

checks

100.00% ✓ 188 ✗ 0

data_received

2.3 MB 39 kB/s

data_sent

3.9 MB 65 kB/s

dropped_iterations

1613 26.656141/s

http_req_blocked

avg=8.32µs min=3.54µs med=5.21µs max=259.88µs p(90)=6.87µs p(95)=8.18µs

http_req_connecting

avg=1.61µs min=0s med=0s max=153.25µs p(90)=0s p(95)=0s

checks

100.00% ✓ 954 ✗ 0

data_received

12 MB 40 kB/s

data_sent

20 MB 66 kB/s

dropped_iterations

23345 77.340364/s

http_req_blocked

avg=7.44µs min=2.89µs med=5.34µs max=235.67µs p(90)=7.39µs p(95)=8.49µs

http_req_connecting

avg=1.14µs min=0s med=0s max=180.71µs p(90)=0s p(95)=0s

load.jsarrow-up-right

http_req_duration

http_req_duration

Development

hashtag
Adding Packages

  • The Go Cli runs all services from the jembi/platform docker image. When adding new packages or updating existing packages to Platform you will need to build/update your local jembi/platform image. How to build the image.

  • As you add new packages to the platform remember to list them in the config.yml file - otherwise the added package will not be detected by the .

hashtag

platform-cli tool