iort
- Interoperable Outcomes Research ToolsThe Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is an open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence.
iort
is a library and command-line utility to make use of the OMOP CDM.
You can think of iort
as a swiss-army knife for the OMOP CDM.
Most current user-facing OMOP CDM tools depend upon R
, but iort
is written in Clojure and runs on the JVM, and so is also usable from other JVM languages such as Java. iort
can be run from the command-line as a runnable 'uberjar', or from directly source code if Clojure is installed.
As such, iort
uses a simpler approach than the OHDSI tools, generating DDL statements directly from the canonical CSV specifications.
iort
is designed to be composable with a number of other healthcare related libraries and tools:
These tools follow a similar pattern in that they provide:
I need to take healthcare data from multiple sources, transform and normalise those data, and aggregate to support direct care and analytics. As the CDM creates a 'standard' schema for healthcare data, we can use CDM as an intermediary data format. This would not work unless you also standardise the vocabularies in use; having ready access to advanced SNOMED CT tools such as hermes
in conjunction with other sources of reference data (e.g. the UK dictionary of medicines and devices, the UK's organisational data for healthcare sites/locations, as well as the CDM vocabularies facilitates creating 'pluripotent data'. You can, of course, use iort
without using hermes
or dmd
.
For example, I supplement the CDM standard vocabulary with other tooling so that I can make sense of the latest data. For example, there will be SNOMED concepts in the UK extension that are in the standard vocabulary, and I define cohorts using an expressive mix of ICD-10, OPCS, ATC and SNOMED CT, and I need to make use of historical associations. As such, only using the OMOP CDM vocabularies available from Athena is insufficient. Composing different data-orientated tools is important and useful.
iort
will provide both a library and a command-line tool to support interoperable outcomes research:
It will therefore possible to build an iort
pipeline that will initialise and populate a database with the OMOP CDM, and execute your own custom logic to extract and transform data from potentially multiple source systems, and potentially making use of the tools above for that process of normalisation, and write into a CDM. Likewise, one might instead use iort
as part of a real-time analytics pipeline to take a feed from, for example, Apache Kafka, to transform and insert into a CDM-based database.
iort
is a new project and under active development. It is now partly functional and is being developed in the open.
Here are the items from the roadmap already completed:
Here are the items still pending:
iort
is only in the early stages of development, but it is already usable. You will need to install Clojure. Once iort
is ready for a more formal release, I will provide an executable 'uberjar' that will contain multiple database drivers.
e.g. to create CDM version 5.4 database tables, indexes and constraints in a SQLite database called my-omop-cdm.db
clj -M:sqlite:run --cdm-version 5.4 --create --jdbc-url jdbc:sqlite:my-omop-cdm.db
e.g. to create CDM version 5.4 database tables, indexes and constraints in a PostgreSQL database, omop_cdm
clj -M:postgresql:run --cdm-version 5.4 --create --jdbc-url jdbc:postgresql:omop_cdm
clj -M:run --create --dialect postgresql
clj -M:run --create --dialect sqlite
Databases such as SQLite cannot add foreign key constraints after database tables have been created, so you can give hints to iort
so it generates the correct statements for the database type you are using.
clj -M:run --create-tables --dialect sqlite --schema VOCAB
This is ideal if you are creating multiple SQLite databases and will join them only later during your analytics step.
You can choose multiple schema, either by using --schema VOCAB --schema CDM
or using comma-delimited values:
clj -M:run --create-tables --dialect sqlite --schema VOCAB,CDM
e.g. you have downloaded the latest CDM vocabulary from Athena, and want to initialise a new CDM database:
clj -M:postgresql:run -u jdbc:postgresql:omop_cdm --create --vocab ~/Downloads/vocabulary_download_v5
This will connect to the PostgreSQL database omop_cdm, create all of the tables, import the specified vocabulary files, and then add constraints and indexes.
If you want to use SQLite:
clj -M:sqlite:run -u jdbc:sqlite:omop_cdm.db --create --vocab ~/Downloads/vocabulary_download_v5
For example, if you want to create a SQLite database with only the VOCAB CDM tables and populate them from data downloaded from Athena:
clj -M:sqlite:run --create-tables --jdbc-url jdbc:sqlite:cdm54.db --schema VOCAB --vocab ~/Downloads/vocabulary_download_v5
=>
% sqlite3 cdm54.db
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
sqlite> .tables
concept concept_synonym source_to_concept_map
concept_ancestor domain vocabulary
concept_class drug_strength
concept_relationship relationship
SQLite allows you to create to multiple databases and perform joins across them, so this is a useful way to combine the standard CDM vocabulary with your clinical data derived from one or more of your operational clinical data sources.
With databases other than SQLite, you are more likely to store the CDM vocabulary within the same database as your CDM data.
clj -M:postgresql:run -u jdbc:postgresql:omop_cdm --drop-constraints --drop-indexes
clj -M:postgresql:run -u jdbc:postgresql:omop_cdm --add-constraints --add-indexes
R
based OHDSI toolchain?The current OMOP toolchain has a variety of steps. For example, the initialisation of database tables, indexes and constraints is generated using R
in the open-source repository https://github.com/OHDSI/CommonDataModel, but the SQL statements cannot be readily executed independently as they include placeholders for the R
toolchain to complete. The specifications for the CDM are actually recorded in CSV files, but these are processed to generate markdown and the markdown processed into parameterised SQL DDL statements, which are processed by the R
toolchain to execute database-specific DDLs. Some of the R
toolchain actually uses RJava to consume OHDSI Java libraries such as SqlRender.
In my view, all of those steps make the process of database initialisation more complex, and more difficult to reproduce in data pipelines. I have a strong preference for automation, and simplicity. Many of my design decisions are based upon wishing to create potentially ephemeral OMOP CDM-based databases, such as file-based databases based on SQLite created on demand for end-users, as well as the more conventional approach of looking after a single carefully maintained observational analytics database. For that, I need to be able to initialise and populate a CDM database on demand from operational clinical systems, and that means needing to generate DDL SQL statements on the fly without depending on installing R
.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close