Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
JemPI can be installed in two ways:
Standalone Installation: This is useful for development or custom implementations
Jembi Platform Installation: This is the recommended way to install JemPI, it uses the Jembi Platform deployment tooling which allows JemPI to be installed with a single command. It also includes other useful software like the OpenHIM to authenticate API and route API request.
Pronounced 'Jem-P-I'
JeMPI is an entity matching (commonly used for patient matching) and linking technology that supports batch and transactional matching via advanced techniques though a combination of deterministic and probabilistic matching.
The Jembi MPI, also known as JeMPI, is a standards-based client registry (CR) or master patient index (MPI).
JeMPI facilitates the exchange of patient information between different systems and holds patient identifiers that may include patient demographic information. This is a necessary tool for public health to help manage patients, monitor outcomes, and conduct case-based surveillance.
JeMPI’s primary goal is to act as a tool in order to solve the issue of multiple or duplicated patient records that are submitted from multiple point of service systems such as electronic medical records, lab systems, radiology systems and other health information systems.
This is achieved by matching the various patient records from different systems under a Master Patient record with a unique ID. This allows for downstream applications, such as surveillance, to accurately display data and information on patient records without the worry that the data contains multiple records for the same patient.
How it works
The JeMPI Client Registry is a system that incorporates a microservice architecture, each microservice has a specific task such as data cleaning, data storing, etc. These various services communicate through a Kafka message bus, meaning that every service is storing and retrieving data from a specific Kafka topic.
Below the synchronous and asynchronous flow diagram.
Description: A microservice that sends the content of an uploaded csv file to the JeMPI_ETL service. the JeMPI_AsyncReciever service produces kafka messages where each message has a row from the CSV file uploaded. it will then be saved under a kafka topic.
The base version uses a reference implementation with the fields below:
String auxId, String givenName, String familyName, String gender, String dob, String city, String phoneNumber, String nationalId
Input
A CSV file located in the JeMPI_AsyncReciever associated volume, under */app/csv_ directory (this can be done through HTTP request). Example of input file:
Output
The service will save the data from the CSV file, one line at a time. Kafka topic: TOPIC_INTERACTION_ETL="JeMPI-interactions-etl"
Description: A microservice that pocesses the input coming from the JeMPI_AsyncReceiver. The JeMPI_ETL service will perform some data trasformation e.g. lower case the values for name of the patient or unformat the date for the date of birth. The resulting data will be sent as JSON (JSON Streaming) to the JeMPI_Controller service.
Input:
Data coming from the JeMPI_AsyncReciever service. Kafka topic: _TOPIC_INTERACTION_ETL="JeMPI-interaction-etl"*
Output:
Data transformed into JSON that will be sent to the JeMPI_Controller. It will be stored in the Kafka topic: _TOPIC_INTERACTION_CONTROLLER="JeMPI-interaction-controller"*
Example or a Kafka message coming from the interaction controller topic:
Description: The JeMPI_Controller service has multiple tasks:
Send the data coming from the JeMPI_ETL to either the JeMPI_Linker or the JeMPI_EM services, based on the workflow selection made by user on import. The data will be stored in their respective Kafka topics accessed (consumed) by those service.
Input: Data coming from the JeMPI_ETL service Kafka topic: _TOPIC_INTERACTION_CONTROLLER="JeMPI-interaction-controller"*.
Values of the M & U computed in the JeMPI_EM service Kafka topic: _TOPIC_MU_CONTROLLER="JeMPI-mu-controller"*
Output:
Send the data to the JeMPI_EM
Kafka topic: TOPIC_INTERACTION_EM="JeMPI-interaction-em"
MU process: Kafka topic: TOPIC_MU_LINKER="JeMPI-mu-linker"
Send the data to the JeMPI_Linker
Kafka topic: TOPIC_INTERACTION_LINKER="JeMPI-interaction-linker"
Description: A microservice that will create an object containing m&u of a patient against patient records that go into the EM algorithm (quality (m) and the uniqueness (u) per field). This object is used in the linker for matching patients. It uses a machine learning called Estimation maximisation (EM) algorithm to optimize that value, it is launched after receiving a number of records specified in the configuration.
Input: Kafka topic: TOPIC_INTERACTION_LINKER="JeMPI-interaction-linker"
Output: Kafka topic: TOPIC_MU_CONTROLLER="JeMPI-mu-controller"
Description: A microservice that will interact with Dgraph database to do the matching of the patients. The Linker uses thresholds to drive the linking and notifications for review processes. These thresholds are the following:
A single match or no match threshold : the interaction will automatically be linked to the highest golden record candidate above the threshold. If no candidate has a score above the threshold, a new golden record is created. This is typically used for fully autonomous linking.
Window around the match/no match threshold : if the highest score generated for the candidates falls within this window, a notification is sent for Admin to review the interaction.
Margin threshold : if another candidate falls within a margin from the highest score and this highest score is above the match threshold, a notification for review is sent for the Admin to review the linked interaction.
Input: Kafka topic: TOPIC_INTERACTION_EM="JeMPI-interaction-em"\
Output:
Interact with the Dgraph database using GraphQL queries/mutations, save the interactions and the links.
Send response of either the link info or the list of candidates to the Controller
Save response to Kafka topic: TOPIC_notifications=”JeMPI_notifications”
Description: The Dgraph database used for JeMPI to store the patient records. it is a graph database.
Component linked:
Dgraph Ratel: A tool for data visualization and cluster management. Ratel can be used with Dgraph to manage cluster settings, run DQL queries and mutations and see results of the mentioned operations.
Dgraph Alpha: Expose and host endpoints of the indexes.
Dgraph Zero: it is like a Zookeeper/KRaft in Kafka, it will control the instances of Alpha by assigning them to a group, and re-balances the data between them.
Description: Kafka the message queue bus, it contains all the topics used previously in the other components.
Description: The JeMPI_API service contains the endpoints needed to interact with JeMPI.
It performs the following functions:
Read data from the Kafka topic TOPIC_notifications=”JeMPI_notifications”
Save data related to the administration in PostgeSQL DB
Get the data from PostgreSQL when the JeMPI Web requests data.
In the following, we will introduce the software prerequisites to be able to run the client registry jempi on your machine using the platform.
Please refer to the in order to install docker on your machine. It is best to follow the post installation process so that you grant docker sudo access.
Installing WSL2 is required to be able to develop and test the project. It is recommended to limit the memory usage of WSL2.
Create /tmp/logs directory
Create the docker platform image
Initialise Docker Swarm
Run 'go cli' binary to launch the project
Launch the client registry jempi package profile a. all packages and profiles are configured in the ./config.yaml field b. updates to environment variable can be made in the profile env file ie: mpi.env
Access : http://localhost:3033/login
Sign in with Keycloak user credentials
Useful for development or custom implementations
In the following, we will introduce the software prerequisites to be able to run the JeMPI client registry on your machine.
Please refer to the in order to install docker on your machine. It is best to follow the post installation process so that you grant docker sudo access.
Installing WSL2 is required to be able to develop and test the project. It is recommended to limit the memory usage of WSL2.
Follow the steps to install sdk , check if you already have it by running sdk. To install it, run the two following commands:
To check the installation, you can check the version by running: sdk version
We should install the following:
Maven: Command: sdk install maven Version: 3.9.8
Scala Build Tool: Command: sdk install sbt
Java: Command: sdk install java 21.0.3-tem Version: Temerin 21.0.3-tem (See list by running: sdk list java)
Note: when installing with a non-root user set java directory
Check the version of java by running: java --version. We should get: Temurin-21.0.3+9.
In the following section, we will discuss the steps for running JeMPI on you machine, start by cloning the JeMPI repository on your machine and navigate to JeMPI by running the following command in you terminal of choice
This Bash script is designed for deploying JeMPI locally with various options. It performs tasks such as installing Docker, SDKMAN, Java, Maven, and SBT, setting up the environment configuration, creating a Docker registry, pulling and pushing Docker images, initializing the Docker Swarm, building the entire stack, rebooting, restarting, tearing down, Backup & Restore Databases and destroying JeMPI.
Location of file - JeMPI/devops/linux/docker/deployment File Name - local-deployment.sh
Set following variables JAVA_VERSION=21.0.3-tem JEMPI_ENV_CONFIGURATION=create-env-linux-low-1.sh
This script must be run from the following path and will not work if executed from a different location
Location of file - JeMPI/devops/linux/docker/deployment
Option 1: Deploy JeMPI (For Fresh Start) This Option used to install JeMPI from Scratch or Fresh setup
Set up hostname and IP address in the Hosts file.
Docker Swarm Initialization.
Creates a Docker registry, pulls Docker images from the hub, and pushes them to the local registry.
Builds and reboots the entire JeMPI stack
Option 2: Build and Reboot
Builds and reboots the entire JeMPI stack.
Option 3: Restart JeMPI
Reboots the entire JeMPI stack
Option 4: Down JeMPI
Stop entire stack
Option 5: Backup Postgres & Dgraph
Postgres backup process creates a folder with a timestamp, and inside it, SQL files are generated for each postgres database.
Backup Directory: JeMPI/devops/linux/docker/docker_data/data/backups/postgres
Dgraph backup process creates a folder with a timestamp, and inside it generates the Json file of data.
Backup Directory: JeMPI/devops/linux/docker/docker_data/data/backups/dgraph
Option 6: Restore Postgres & Dgraph
Users need to confirm with “ctr + Y” for restore.
This process will wipe all existing data from both Postgres and Draph DB’s and restore new from backup.
Users need to enter the folder name of the backup directory to initiate the restore process.
Option 7: Re-Deploy JeMPI
Updates environment configuration settings
Update HAProxy settings
Pulls Docker images from the hub, and pushes them to the local registry.
Builds and reboots the entire JeMPI stack
Option 8: Install Prerequisites
Install SDKMAN - SDK Manager
Install Docker
Install Java, Maven, and SBT using SDKMAN
Option 9: Destroy JeMPI (This process will wipe all data)
This process will remove all stack from swarm and leave the swarm.
Remove all data and volumes
The script prompts for user input to select an option.
Confirmations are requested for critical actions.
Use Ctrl+Y for "Yes" confirmation to Destroy all systems and Restore DB.
Customize the script as needed for your specific deployment requirements.
Setup an IP address Before starting the process of running JeMPI, you will need to setup an IP address for your machine.
On your terminal of choice, run the ip a
command and retrieve the ip address from your wi-fi or ethernet interfaces
In our case, we are using the enp0s3
interface, the IP address that we will need is 192.168.1.137
.
Next, you will need to set up the hostname for your machine. to do so, run the command bellow, it will open the hosts
file under /etc/
directory using the nano
text editor (you can use any other editor e.g. VIM
, VI
, Emacs
, Helix
, etc.) :
Keep the localhost IP and Comment any other IP address, follow the screenshot bellow.
Initialize the environment variables In the JeMPI directory, navigate to: docker/conf/env/ directory.
if you have less than 32Gbs of ram run the ./create-env-linux-low-1.sh. If you have 32Gb or more, run the ./create-env-linux-high-1.sh. both those script will create conf.env file that we will need.
Note: for server installations, manually set SERVER_IP for environment variable before executing the script
Pull the latest images
Pull the latest image versions form docker hub using the a-images-1-pull-from-hub.sh
Make sure you have a clean docker swarm
It is fine to keep the images, you can either remove all the services, containers, volumes, configs and secrets.
Run the b-swarm-2-leave.sh
to leave your current swarm
After running the previous script, initialize a new swarm by running the b-swarm-1-init-node1.sh
script locacted in the JeMPI/docker/ directory.
Add the ability to use local registries
Now, we need to tell docker that it is okay to run on the local registry because it is http and not https.
Go to devops/linux/docker/helper/scripts.
Run ./x-swarm-a-set-insecure-registries.sh (you need to grant it executable access first by running: chmod +x ./x-swarm-o-set-insecure-registries.sh), it will edit the file /etc/docker/daemon.json and will restart docker to make changes take effect.
NB: The script will edit the access grants of the /etc/docker/daemon.json file.
Create a local registry Now that you can use local Docker registries, run the c-registry-1-create.sh
script to create a registry service. This service will host the docker images that we will use in our stack.
Push the images to the local registry We will need to pull images from docker hub then push them to the local registry :
Run the stack
After pushing the images into the local registry, we are ready to run the app, we have several options, we can run the whole stack (UI + Backend) by running the d-stack-1-build-all-reboot.sh
, Or run each of the backend (d-stack-1-build-java-reboot.sh
) and the UI (d-stack-1-build-ui-reboot
) seperatly.
Other scripts
d-stack-2-build-java.sh: This script will build and push the backend services to the local docker registry.
d-stack-3-down.sh: This script will remove all services from the stack.
d-stack-3-reboot.sh: This script will only remove everything and start again.
That's it 🚀
Check the deployment sanity
To check for running containers you can run: docker container ls To check for running services you can run: docker service ls To list all the containers you can run: docker ps -a
Or you can go to devops/linux/docker/helper/scripts and run d-stack-ps.sh and it will run: docker stack ps <NAME_STACK>
Example of the stack when running with local docker registry: (docker stack ps jempi)
Example of the stack when running with docker hub: (docker service ls)
Stop or remove the stack
To remove everything in the swarm: you can go to devops/linux/docker/ and run b-swarm-2-leave.sh
To shut down the stack: you can go to devops/linux/docker/ and run: d-stack-3-down.sh
To read more about it check this link: .
We encourage any contributions and suggestions!
We use Discord as our preferred communications and support platform for the JemPI. We subscribe to and are bound by the OpenHIE Community Guidelines (https://discourse.ohie.org/faq). If you have a question about the JemPI or would like to get involved, please join the conversation on Discord in the #jempi channel.
For developers looking to log bugs and/or feature requests, you can also add an issue on Github, or submit pull requests for changes that you'd like to see.
Next Release:
UI/UX Enhancements
Performance improvements e.g. vector search
Role-based Access Control
Future Goals:
AI-driven matching algorithm selection
Aim to increase performance by an order of magnitude by exploring new technology options.
Enhance UX to make JeMPI the easiest MPI to install, use and maintain.
DevOps UI to further simplify and accelerate deployment and maintenance tasks
API endpoints documentation
The following endpoint returns the fields configuration needed by the frontend (JeMPI-UI) in order to properly display interactions data according to a specific implementation. This endpoint returns a JSON array. Below a sample of the response :
For each field we have a set of attributes, as defined below :
The fields should be configured manually in the config json file : devops/linux/docker/data-config/config-reference-link-dp-api.json
There's two type of fields :
Custom fields : Indexed by the key "fields", contains all the fields that are specific to the implementation. Examples : givenName, nationalId, ...
System fields : Indexed by the key "systemFields", it contains all the fields that are readonly fields and do not change across the implementation. Example : uid, record type, score, ...
! IMPORTANT : The
fieldName
inconfig-reference-link-dp-api.json
should be set in snake-case, but it's returned in camel-case by the API.
Below is a sample of the body you are to send :
The following endpoint returns notifications. notifications are used to inform the user about potential interaction linking to golden records and are generated when a certain case is triggerd. the response contains parameters (count
, skippedRecords
) that are useful for pagination.
Below are the necessary parameters to get the notifications list
Below is a sample of the response
The following endpoint returns an interaction given a uid
is supplied. This endpoint returns a object.
Below is a sample of the body:
Below is a sample of the response:
Given a supplied uid
, The following endpoint returns an expanded golden records=, meaning a golden record with interactions linked to it. This endpoint returns an object.
Below is a sample of the body you are to send :
Below a sample of the response :
The following endpoint will return a list of expanded golden records given a list of golden Ids (GIDS_LIST
).
Below is a sample of the body you are to send :
Below is a sample of the response:
The following endpoint returns a list of the saved (created) Golden record Ids.
The following endpoint returns a list of gids paginated, given parameters OFFSET
and a LENGTH
.
Below is a sample of the body you are to send :
Below is a sample of the response:
The Following endpoint returns the audit trail for a given Golden Record with a Golden_Id GOLDEN_ID
Below is a sample of the body to send :
Below a sample of the response :
The Following endpoint returns the audit trail for a given Interaction with Interaction Id INTERACTION_ID
Below a sample of the request body:
Below a sample of the response:
The following endpoint returns the golden record count available in the database
The following endpoint returns the interaction count available in the database
The following endpoint returns the record count available in the database. bellow is a sample of the response
The following endpoint update the notification state given a notification Id and a state. Below a sample of the request:
The following endpoint is used for the simple search either for golden or interactions.
Below a sample of the request body :
When the request is sent to the url /search/golden
the response payload will contain the list of golden records along with the linked records and the result set total (useful for pagination) :
When the request is performed against the url /search/patient
the response payload will contain the list of interactions along with the search result total count :
The following endpoint is used for the custom search either for golden or interactions.
Below a sample of the request body :
The response payload is similar to the one returned by the simple search API endpoint.
The following endpoint is used to upload file into JeMPI. the file uploaded will be put into the async_reciever
's storage under the /csv
directory.
The following endpoint is used to calculate the score between an interaction and a set of golden records
Below a sample of the request body
The following endpoint returns a paginated Golden Ids list a request body illustrated in the example below
Note: this endpoint is similar to the search/(golden|patient)
This endpoint is used for the patient registration process, in that it checks if the patient already exists, preventing duplicate patient registrations. This relies on the threshold provided in the request.
Below is a sample of the request body:
This endpoint is used to find register patient golden records
Below is a sample of the request body:
The following endpoint returns the list of candidate golden record given demographic data of a record and a threshold Below a sample of the request body
Below a sample of the response body for this endpoint
This endpoint is used to create a new golden record for an interaction, that the MPI identified as a possible match to another golden record
Below is a sample of the request body:
Below is a sample of the response:
This endpoint point is used to link an interaction, identified by the MPI as a possible link to a golden record
Below is a sample of the request body:
Below is a sample of the response:
The following endpoint updates the fields of a golden record. This endpoint returns an object. Below a sample of the request :
When at least one of the fields update is successful, the endpoint will return an array of the fields that have been successfully updated. If none of the fields has been updated, the endpoint will return a "400 Bad Request" response. The endpoint will return a "500 Internal Server Error" if the update failed for a different reason.
Enable SSO using Keycloak
We use KeyCloak for identity management. This provide us with a OpenID Connect (an extension to OAuth 2.0) compliant identity service that we can use to authenticate users. Much like what Google and Github provide to login to other apps. Keycloak will provide:
The login user experience, including signing in page
2FA
Password reset features, account management
The ability to manage user permissions centrally, across applications
Our applications will just consume the resulting ID token that is produced to authenticate users and to check the roles that they are assigned.
We currently support the Auth Code Flow :
User access the JeMPI UI and clicks on "Sign-In with Keycloak".
User is redirected to Keycloak where he needs to submit his credentials.
User gets redirected back to the JeMPI UI along with the auth code parameters.
Auth code parameters are sent to the "POST /authenticate" JeMPI API endpoint.
JeMPI API sends the auth code to Keycloak along with the Client ID and Client Secret.
JeMPI gets token and verifies it, then parse the user infos (email, username, ...)
User is added to the Postgres Database if it's the first time he signs in.
JeMPI API creates a session and sends back the user object along with the session cookie.
User s redirected to the homepage.
Clone the JeMPI git repository
Update local config to use Keycloak
Execute the local-deployment script
Select Option 1: Deploy JeMPI (For Fresh Start)
Access : http://localhost:3000/login
Sign in with Keycloak user credentials
Enable backup and restore for Postgres and Dgraph datastore's
This Functionality provides detailed instructions on how to perform backup and restore operations using the JeMPI_BackupRestoreAPI.
This is a dedicated application for handling both backup and restore operations. The scripts included in this application cover:
Backup Dgraph using API and dump JSON file.
Backup Postgres using SQL-Dump
Restore Dgraph using API using JSON file.
Restore Postgres using SQL-Dump
Dump sql files for Postgresql.
Get all GoldenIds
For each GoldenId: a. Retrieve the golden record b. Get the list of Golden Record Source Ids c. Get the list of interactions d. Write the data to file (JSON)
PostgreSQL Version: Ensure that PostgreSQL 15.5.0 is installed.
The backup and restore operations are validated on this version. Verify the installation by running psql --version
Python Installation
Make sure Python and the python-dotenv package are installed to manage environment variables.
Verify the installation by running python3 -m dotenv --version
Using python-dotenv Load these variables using python-dotenv:
from dotenv import load_dotenv
import os
load_dotenv(/path/to/your/.env)
print(os.getenv(environment variable name))
python3 test_dotenv.py (This should load the environment variables from your .env.local file)
Backup Directory Dgraph: JeMPI/devops/linux/docker/docker_data/data/backups/dgraph
Backup Directory Postgres: JeMPI/devops/linux/docker/docker_data/data/backups/postgres
Deployment File: local-deployment.sh
Backup Script Path: JeMPI/devops/linux/docker/backup_restore/dgraph-backup-api.sh
Backup Script Logic Dgraph: JeMPI/devops/linux/docker/backup_restore/dgraph-backup-api.py
Backup Script Logic Postgres: JeMPI/devops/linux/docker/backup_restore/postgres-backup.sh
The backup process creates a folder with a timestamp. Inside this folder, backups are created for each Dgraph and Postgres.
Manual Backup Process
pg_dump -U -d <database_name> > /path/to/backup_file.sql
Verify if process was successful
echo $? (This variable holds the exit status of the last command executed. An exit status of 0 indicates that the last command (pg_dump) completed successfully.)
ls -lh
Backup Directory: JeMPI/devops/linux/docker/docker_data/data/backups/dgraph
Deployment File: local-deployment.sh
It will prompt for confirmation (yes/no) and list the recent 5 backup folders.
Enter the backup folder name - it will start the restoration from the selected backup file.
Backup Directory Dgraph: JeMPI/devops/linux/docker/docker_data/data/backups/dgraph
Backup Directory Postgres: JeMPI/devops/linux/docker/docker_data/data/backups/postgres
Manual Backup Run Script: ./restore-dgraph-postgres.sh {{ Folder_Name }}
Configuration Settings The configuration settings screen enables the user to make edits to the default settings, the best fit the desired implementation of the MPI.
Common Properties This tab defines the demographic details for a patient that will be used for linking.
The user can do the following:
-Choose to select the close button to exit edit mode. -Choose to select the save icon button to save changes made and exit edit mode. -Edit the relevant fields and select the save button to save changes on the current tab.
Deterministic The deterministic tab is used to define the deterministic rules.
The deterministic tab has three sub tabs :
Linking
Validate
Matching
Source view this view allows the user to do the following :
View the displayed rules
Click edit mode by clicking the edit icon button which opens up the design view
Design view this view allows the user to do the following :
Select the operator values from a dropdown field, e.g., "And" and "Or"
Select common field values from a dropdown field
Select comparator function from a dropdown field, e.g., "Exact", "Low Fuzziness", etc.
Add a second row of input fields by selecting the add icon button
Save rule by selecting the add rule button
Exit edit mode and cancel previous edits.
The blocking tabs have two sub tabs:
Linking
Matching
The blocking sub tabs have two different views :
Source view This view allows the user to do the following :
View the displayed rules
Click edit mode by clicking the edit icon button which opens up the design view
Design view This view must allow the user to do the following:
Select the operator values from a dropdown field, e.g., "And" and "Or"
Select common field values from a dropdown field.
Select comparator function from a dropdown field, e.g., "Exact", "Low Fuzziness", etc.
Add a second row of input fields by selecting the add icon button
Save rule by selecting the add rule button
Exit edit mode and cancel previous edits.
Do not allow the link threshold (green circle): To be < the Minimum threshold review value To be > the Maximum threshold review value
Rules on Threshold For all threshold values that are entered, system allows for exponential notation e.g. 123E-3 which is the same as 0.012 System display default values
Nodes This section displays the following :
Golden record node
Interaction node
Source ID
Golden record node shows properties unique to the golden record. Interaction node shows properties unique to the interaction. Source ID : The third node denoted e.g Source ID, shows unique common lists e.g
Source ID list
Biometric ID list
Section 1: Getting Started
Top Navigation
The navigation bar on the top of the application is always visible and accessible to the user. There are 4 main screens, i.e., Dashboard, Browse, Notifications, and Import.
Understanding the Top Navigation bar
Terms and Definitions to get started with understanding JeMPI
Blocking: Reducing the search space by grouping records by similar attributes into blocks.
Candidate records: A short list of Golden Records generated as a result of blocking. The records on this short list are referred to as candidate (golden) records, as they are potential candidates for linking.
Patient Interaction record: Stores demographic information of the patient e.g. name, surname, DOB, gender, address, etc. This information together with the unique source system ID is used to uniquely identify patients.
Patient ID: Unique identifier of the patient record assigned by the JeMPI upon entry.
Golden record (GR): A golden record is created for a patient if this is the first and/or only patient record to be stored in the database. This is the same as a master record. The golden record links records based on a match score, i.e., determines that 2 or more patient records belong to the same person.
Always has the most up-to-date information for a patient by consensus among the golden record's interactions.
Golden ID: Unique identifier of the Golden record.
Link Threshold (LTH): Predetermined values that allocate the match status of two records based on the probabilistic score generated by the comparison algorithm. M & U values: The matching (M) and unmatching (U) values derived for each record field.
The m values can be expressed as data quality and calculated as the ratio of matching attributes given that they belong to the same record.
The u values can be expressed as data uniqueness and calculated as the ratio of matching values given that they do not belong to the same record. -Because the status of matching records is unknown, m and u values are calculated using the Expectation Maximization algorithm.
Matching configuration: The basic matching configuration allows for the adjustment of an acceptable matching score between two or more records. Should the threshold be raised, only high matching scores will result in a confirmed match and thus is more stringent. Should the threshold be lowered, moderate matching scores will result in confirmed matches and thus be more lenient.
Relaxed Search: Relaxed searching is functionality where refined blocking is performed with different criteria (e.g., such as another condition with different fields) to increase the number of results. If the user is not happy with the results, this functionality may perform filtering instead of blocking. (e.g Find all males in village x)
Similarity Score: A similarity score is often the normalised expression of similarity between two strings, whereby 1 represents an exact match and 0 represents no similarity at all. Popular algorithms used to calculate similarity are Jaro-Winkler and Levenshtein (normalised).
Matching Score: Also referred to as a linking score, is the accumulated value attributed to show evidence for the positive match of two records.
Different metrics and values are applied to generate a matching score, and a threshold is chosen to assign positive matches for those records exceeding the threshold. A matching score can be normalised from 0 to 1.
Review Threshold (RTH): Determines if a record must be flagged for peer review and a notification must be sent.
GR Changed Patient record Threshold: When a Golden record field is edited and saved, the established links to the Golden Record are recomputed.
For each linked patient record where the similarity score falls below this threshold, a ‘for review’ event is queued for that patient record. Example: 5 Patient records are linked to a GR. GR fields are updated. After the update, 2 of the 5 patient records now fall below this threshold. For each of these 2 patient records, an event will be queued (Two events sent)
Important Identity and Unique numbers used in JeMPI
Golden Record ID: This is the unique identity number associated with each Golden Record. It is unique and cannot be edited. The golden records contain a set of all source ID’s from the linked patient records
Patient ID: For each patient record that enters JeMPI, the system will provide a unique ID for the patient record. This number is unique and cannot be edited.
Source System ID: The source system ID is the incoming patient identifier defined as the source system ID and the patient ID within that source.
Auxiliary ID: The auxiliary ID is a generated ID for Test Data developed to bootstrap and measure the accuracy of the JeMPI system. This ID is not relevant outside of testing.
STAN: System Trace Audit Number. A unique ID to trace messages through the JeMPI system, created by the client. The client defines the format of this STAN.
Section 2: Dashboard
The Dashboard screen has 3 tabs
Confusion Matrix
M & U values
Import Process Status.
This tab is subdivided into 3 sections, starting with the right, the Confusion Matrix.
Confusion Matrix
Understanding the confusion matrix
The confusion matrix displays a tally of the true positives, false positives, true negatives and false negatives. This is used to calculate the precision and recall.
The f-score is a measure of a model’s accuracy on a dataset. It is a harmonic mean of precision and recall.
Beta F-scores
Records and notifications
Records
Displays the total number of Golden records and total number of interactions. Notifications - Displays total number of notifications split by:
Open Notifications
No. of New & Open notifications Closed notifications
No. of Closed notifications
Note: The number of new and open notifications (basically notifications that are not closed) affects the accuracy of the F-score and depending on the % of notifications that have not been actioned.
Dashboard Tab 2: M & U values
This screen provides us with a view of the M & U values as per the last periodic update.
Tally Method
The Tally method computes M & U's per field, which is used for cross-checking against the M&U's computed by the EM algorithm. These M& U's are not used for probabilistic linking.
What happens when the score is within the notification threshold area?
When in the notification area, we are either above the notification TH or below the notification TH.
Above the threshold (using the incoming interaction and linked GR)
Assume this is correct for 80% of the time - increment A or B by 0.8
Assume this is incorrect for 20% of the time - increment C or D by 0.2
If admin confirms this assumption, then the system must adjust the tallies by adding the 0.2 to A or B and removing 0.2 from C or D If admin rejects this assumption, then the system must subtract 0.8 from A or B and add to C or D
Below the threshold (using the incoming interaction and the candidate GRs)
Assume this is correct for 80% of the time - increment C or D by 0.8
Assume this is incorrect for 20% of the time - increment A or B by 0.2 If admin confirms this assumption, then the system must adjust the tallies by adding the 0.2 to C or D and removing 0.2 from A or B If admin rejects this assumption, then system must subtract 0.8 from C or D and add to A or B
M and U values are calculated as follows:
M = A/( A+B)
U= C/(C+D)
Dashboard Tab 3: Import Process Status
This screen displays the progress of the processing of the file uploaded via the Import screen.
Section 3: Browse
Browse Patients
This screen displays a list of golden records, with the most recent golden record displayed on the top of the grid.
Select the Browse option on the top navigation bar
Screen is displayed with a list of current patient interactions. This is the default view.
The options on this screen are:: a. Select one of the patient interaction (row) to view more details of the patient b. Filter the results to find specific patients and/or list of interactions
Filter by
Filter by option
Select the Filter by panel
System expands the panel and displays the various options to filter the results:
Filter by start and end date - dates can be selected using the calendar picker
Get interactions - returns the golden record and patient interactions for the golden record
Filter by a single field or combination of the fields below: a. UID, First Name, Last Name, Gender, Date of birth, City, Phone No. b. For each of the fields selected, the search can be further extended by selecting a type per field i.e. (i) Exact - returns results that exactly match the value entered (ii) Levenshtein 1 - returns results with low fuzziness and a distance parameter = 1 (iii) Levenshtein 2 - returns results with medium fuzziness and a distance parameter = 2 (iv)Levenshtein 3 - returns results with high fuzziness and a distance parameter = 3 (default)
Enter the search criteria value for one or more fields that you want to search on
Select the FILTER button to view the results a. If no results are found, the system displays a message informing the user that no results are available.
Select the CANCEL button to clear the entered search criteria and repeat steps if required.
Filter by (Get interactions)
When the Get interactions toggle is switched on, the system displays the Golden record (GR) (row highlighted in yellow) and the linked patient interactions. All patient interactions that belong to the GR are displayed under the GR. To view the details of a patient, select the relevant row. In order to view the details of a patient, select the relevant row. System navigates to a detailed view of the selected patient’s interactions.
View Details of Patient Interaction
This is a detailed view of a patient. The first row (highlighted in yellow) is the Golden record. The rows below are the patient interactions. In this example below, the patient has 1 interaction.
How does this work?
When a patient interaction is loaded for the first time, and there is no matching record, a golden record is created using the patient interaction details. Thereafter, every matching patient interaction is linked to the golden record. The golden record is updated based on the following rules:
If a golden record has missing values and the 2nd interaction comes in with a populated value, the system will update the null value in the GR to match the field in the 2nd interaction.
Thereafter, if there are 2 or more interactions with a different field value to the GR field value, then the majority rule applies, in that the field in the GR will be updated as per the majority. This update is configurable and can be disabled
On the Patient interaction screen, the user can do the following:
View the details of the Golden record and its linked patient interactions together with the audit trail
Edit the Golden record (with permissions)
Relink the patient interaction
View Patient Interactions and Audit Trail
The golden record and the interactions are clickable.
When the Golden record is selected, the full audit trail for the patient is displayed, i.e., all the events that occurred on each interaction are displayed.
When the Interaction is selected, the audit trail displays the event for that interaction only (refer to screenshot below).
Editing a Golden Record
The user also has the option to update the applicable GR fields where edits are allowed. No edits will be allowed on any system generated fields, e.g., Golden ID. The fields that are editable are configurable.
How does this work?
After updating the GR field values, on save, the system does the following:
re-computes scores for all automatically linked patient interactions
updates the similarity score to indicate that the record has been manually updated (link score = 3.0)
disables the Master auto-update fields flag to prevent auto-updates
checks the new GR changed Patient record Threshold(TH) and if the scores for any of the linked patient interaction records fall below this TH, then sends a notification for Admin user to review.
Select the record
Select the edit option - the GR row becomes editable.
Enter or edit the field value as required
Select the save option
System displays a successfully saved message.
Relink a patient interaction
Relinking a patient interaction means that the interaction is not correctly linked to a Golden record and the Admin user wants to relink the interaction to an existing Golden record or create a new Golden record. There are 2 ways that a patient interaction can be relinked:
From viewing an interaction - when the user views the interaction, the user may choose to relink the interaction (i.e. no notification received)
From a notification - a notification is received informing the user that some action must be taken. The user can choose to relink the interaction to another golden record or create a new golden record to link to.
Relink from Patient Interactions screen
Select the Relink option
System displays the Review Linked Patient screen (below)
If there are no other candidate records displayed, the user has 2 options:
Change the threshold and refresh to view the candidate golden records and/or
Refine the search to view other candidate golden records
Changing the threshold
Select the Threshold slider
Select the refresh button
The system will display candidate golden records if available. These candidate golden records are displayed as "searched" as opposed to "blocked" when raised by a notification.
Select the LINK button on the candidate golden record that you want to link the patient interaction to
The system displays the interaction together with the new searched candidate golden record and prompts confirmation of the link. a. If the CONFIRM option is selected, the relink is done and system displays a successful message b. If the CANCEL option is selected, then no change is made, the confirmation dialog box is closed and the user is returned to the Review Linked Patient record screen.
Refine Search
The user also has the option to search for more candidates by selecting the Refine search option. There are 2 types of searches - custom search and a normal search function.
Select the Refine search button
Select either the Custom search or Normal search function
Enter the search criteria and select the Search button
System displays results as below
The same steps are followed to relink the patient as mentioned above.
Section 4: Notifications
Notifications Worklist
Displays a list of notifications for user review with a reason for the notification. On the worklist screen, there is a date filter that can be used to filter notifications between a specific date range.
How does this work?
Records are flagged for review and notifications sent when:
Records are flagged for review and notifications sent when: Record is automatically matched to Golden record, but the matching scores fall within the Review threshold_
The GR has been updated, scores are re-computed and matching scores fall below the GR changed Threshold
When a notification is selected, the system displays a detailed view of the GR, the linked patient interactions and displays other candidate golden records if applicable.
The notification can be in 3 states:
New - notification has not been read yet
Open - notification has been read, but no action has been taken yet
Closed - notification has been actioned and complete
Select a notification
System displays the Review Linked Patient Record screen with details of the patient interaction.
There are 4 options on the screen:
Refine search
Relink patient interaction to an existing Golden record
Create new Golden record
Close notification
Refine search options
This option is used when you want to extend the search criteria to search for possible candidate golden records.
Refine search
Select the REFINE SEARCH button
There are 2 types of searches that can be done, i.e., Custom search and a normal search
The system displays the Custom search option (default view)
Select the field type, enter a field value and the match type
You can also add more than one rule if applicable and select the SEARCH button
Alternatively, select the SEARCH option
Enter or select the search criteria as per the screenshot below
Select the SEARCH button to view results
The system populates the results in the Review Linked Patient Record screen. The results are populated as other candidate golden records labelled as “searched”.
Review Linked Patient record
Relink function
The same process is followed as mentioned above under the Browse Patient interactions, the relink option. The only difference in the process is that when relinking from a notification, the notification will change to a closed status.
Create New Golden Record function
If there is no matching candidate golden record, then the patient interaction can be linked to a new golden record. Note: this option is only available if there is more than one interaction linked to the GR. If there is only one interaction, then the creation of a new golden record option is disabled.
Select the Create new Golden record button
The system displays a message to confirm that the current link will be changed and a new Golden record will be created.
When the CONFIRM button is selected, the system:
removes the link between the patient interaction and the current Golden record
Links the patient record to the candidate golden record
Updates the score to 3.0
Updates the Notification state from New/ Open to Closed.
Close Notification
When a notification is selected from the Notification Worklist screen, the Review Linked Patient record is displayed.
View details of patient interaction and golden record
The patient interaction is correctly linked, select the CLOSE button.
The system displays a confirmation message
Select the CONFIRM button.
The system:
saves and updates the link score to 3.0.
notification state is updated from New/Open to Closed.
Section 4: Import
The Import data and metadata screen enables the user to select a file to upload, configure machine learning, set the threshold values and select how the results must be generated. All steps are mandatory and must be completed for the import process.
Select the Import option from the main navigation bar System displays the Import screen
Machine Learning Configuration
Select one of the options below to configure machine learning:
Send to the linker and use the current M & U values
Send to the EM task to compute new M & U values
Thresholds
Enter the threshold values. All values must be entered as per the rules defined. 3. Rules on threshold slider
Do not allow the link threshold (green circle):
To be < the Minimum threshold review value
To be > the Maximum threshold review value
Rules on Threshold
For all threshold values that are entered, system allows for exponential notation e.g. 123E-3 which is the same as 0.012
System display default values
Refer to the Fields and Validation table for more details on thresholds
If a value entered does not match the allowed values, then the system displays an error message informing the user that the value entered is not allowed.
Reports
Select one of the options below to determine if a result file is required: 6. Link records only. Do not generate a file.
This option does not create a result file. The system must link the records in the file only.
Create a CSV file and send a notification once the results file has been generated
Creates the file. The filename must include a STAN (--), Interaction ID and golden ID.
System sends a notification when the input csv file has been created. The notification must include the URL of the filename.
Select the SUBMIT button 8. This button is disabled until all required selections have been made, i.e., file must be uploaded, configuration selected, threshold values populated, and reports option selected. 9. When the SUBMIT button is enabled, select submit System displays a confirmation message to confirm the upload.
Select the CANCEL button
This action clears the selected and/or entered values. User has the option to start again or leave the screen.
Once the file has been uploaded, the user can return to the Dashboard, Tab 3 and view the progress of the import process.
Fields and Validation - Thresholds
Section 5: Configuration Settings
Common Properties
The user can do the following:
Select the Edit icon button to initiate edit mode on a row for the common properties.
When the row is in edit mode the following changes occur :
Choose to select the close button to exit edit mode.
Choose to select the save icon button to save changes made and exit edit mode.
Edit the relevant fields and select the save button to save changes on the current tab.
Deterministic
The deterministic tab is used to define the deterministic rules. The deterministic tab has three sub tabs :
Linking
Validate
Matching
Source view
This view allows the user to do the following :
View the displayed rules
Click edit mode by clicking the edit icon button which opens up the design view
Design view
This view allows the user to do the following :
Select the operator values from a drop down field eg “And” and “Or”
Select common field values from a drop down field
Select comparator function from a drop down field eg “Exact”, “Low Fuzziness” etc
Add a second row of input fields by selecting the add add icon button
Save rule by selecting the add rule button
Exit edit mode and cancel previous edits.
Blocking The blocking tab is used to define the blocking rules.
The blocking tabs have two sub tabs :
Linking
Matching
The blocking sub tabs have two different views : Source view This view allows the user to do the following :
View the displayed rules
Click edit mode by clicking the edit icon button which opens up the design view
Design view
This view must allow the user to do the following :
Select the operator values from a dropdown field, e.g., "And" and "Or"
Select common field values from a dropdown field
Select comparator function from a dropdown field, e.g., "Exact", "Low Fuzziness," etc.
Add a second row of input fields by selecting the add icon button
Save rule by selecting the add rule button
Exit edit mode and cancel previous edits.
Probabilistic In the Probabilistic tab, the user can define the linking threshold ranges and/or values.
Rules on threshold slider
Do not allow the link threshold (green circle):
To be < the Minimum threshold review value
To be > the Maximum threshold review value
Rules on Threshold For all threshold values that are entered, system allows for exponential notation e.g. 123E-3 which is the same as 0.012System display default values
Nodes This section displays the following :
Golden record node
Interaction node
Source ID
Golden record node shows properties unique to the golden record. Interaction node shows properties unique to the interaction.Source ID : The third node denoted e.g., Source ID, shows unique common lists e.g.,
Source ID list
Biometric ID list
Attribute | Description | Used by |
---|---|---|
Parameter | Description | Type |
---|---|---|
Select the Edit icon button to initiate edit mode on a row for the common properties. When the row is in edit mode the following changes occur : The colour of the row changes to white The edit icon changes to show a save icon and a close icon
Click add icon button which initiates edit mode , switches to design - - View tab (If there are no existing rules on display)
Delete the existing row of input fields Blocking The blocking tab is used to define the blocking rules.
Click add icon button which initiates edit mode , switches to design view tab (If there are no existing rules on display)
Delete the existing row of input fields Probabilistic In the Probabilistic tab the user can define the linking threshold ranges and/or values.
All values must be entered as per the rules defined.
Dashboard Tab 1: Confusion Matrix
The confusion matrix provides rolling counts of the following:
The f-score is a measure of a model’s accuracy on a dataset. It is the harmonic mean of precision and recall. There are 3 different f-scores displayed below, using the following formula:
It calculates the probabilities based on whether the fields in the pair match or do not match: For each field where the pair matches (above notification), check if you increment A or B (refer to Tally method diagram below) For each field where the pair do not match (below notification), check if you increment C or D
Browse Patients screen with list of interactions
Diagram x - Patient Interaction screen - Golden record
The configuration settings screen enables the user to make edits to the default settings, the best fit the desired implementation of the MPI.
This tab defines the demographic details for a patient that will be used for linking.
The colour of the row changes to white The edit icon changes to show a save icon and a close icon
Click add icon button which initiates edit mode , switches to design view tab (If there are no existing rules on display)
Delete existing row of input fields
Click add icon button which initiates edit mode , switches to design view tab (If there are no existing rules on display)!
Delete existing row of input fields
All values must be entered as per the rules defined.
fieldName
A "camel-case" field name which will be used when accessing a patient record data structure
Backend + Frontend
fieldLabel
A string that is a human readable name for the field
Frontend
scope
Array of URL paths that tells the frontend UI in which pages should the field appear
Frontend
groups
Array of strings which identifies in which section within a frontend UI page should the field be displayed
Frontend
fieldType
A string that identifies the type of field, could be String, Date, ...(useful for formatting for example)
Frontend + Backend
accessLevel
An array of string that identifies which user roles are permitted to access a given field (NOT YET IMPLEMENTED)
Frontend
readOnly
Tells if the field can be editable.
Frontend
validation
An Object used for validating the field.
Frontend
limit
Number of notification that the user want to get from the API
Number
date
A date limit, the user will get the data from the oldest to that particualar date limit (YYYY-MM-DD)
Date
offset
A pagination parameter
Number
state
The state of the notificaitons that we want to fetch interaction (OPEN, CLOSED)
String