A clojure library to automate downloads from the UK TRUD (Technology Reference data Update Distribution).
The source of reference data in the UK is TRUD. However, until 25/1/2021, users needed to manually download distribution files from a web portal. NHS Digital used to provide an ftp server. NHS Digital have now released an API providing metadata on each reference data release, as well as links to the distribution files themselves.
You will need to register as a user of TRUD and get an API key.
Login here https://isd.digital.nhs.uk/trud3/user/guest/group/0/login/form.
Choose your products and request a subscription using their portal.
There is no API for this part. Login to TRUD and get the item identifiers for the distributions you want.
While not primarily designed to be used from the command-line, it is possible to use this as a tool to automatically download multiple distributions from the NHS Digital service to a directory of your choosing.
Here, include the right API key and distributions 101 and 105 will be downloaded into the archive directory specified.
From source code:
clj -X com.eldrix.trud.core/download :api-key '"xxx"' :cache-dir '"/tmp/trud"' :items '[101 105]'
If there is interest, it would be straightforward to make a simple command-line tool. Raise an issue if you need this.
e.g. when using deps.edn:
Make sure you use the latest commit hash from https://github.com/wardle/trud
com.eldrix/trud {:git/url "https://github.com/wardle/trud.git"
:sha "xxx"}
By default, archive files are stored in a cache directory.
Here I use "/tmp/trud"
:
(require '[com.eldrix.trud.core :as trud])
(def api-key "xxx")
(def latest (trud/get-latest api-key "/tmp/trud" 341))
The result will be a map of data direct from the TRUD API for item 341
.
The archive file will have been downloaded and available via :archiveFilePath
.
It will have had some integrity checks made, including checks on file size
and message digest (checksumming).
Result:
(note: this result includes URLs generated using one of my old API keys. Your URLs will be different as they will include your API key and should not be publicly shared.)
{:checksumFileLastModifiedTimestamp #object[java.time.Instant 0x72d19fd2 "2021-01-29T13:28:21Z"],
:publicKeyFileSizeBytes 1736,
:checksumFileSizeBytes 187,
:signatureFileName "trud_hscorgrefdataxml_data_1.0.0_20210129000001.sig",
:name "Release 1.0.0",
:signatureFileLastModifiedTimestamp #object[java.time.Instant 0x3b6f0a6f "2021-01-29T13:28:24Z"],
:itemIdentifier 341,
:releaseDate #object[java.time.LocalDate 0x6cdace1f "2021-01-29"],
:checksumFileName "trud_hscorgrefdataxml_data_1.0.0_20210129000001.xml",
:archiveFileLastModifiedTimestamp #object[java.time.Instant 0x738f5264 "2021-01-29T13:26:23Z"],
:publicKeyFileUrl "https://isd.digital.nhs.uk/api/v1/keys/7daa48e2a26f3afeef6f6c2a2feb00b62bcbe68b/files/public-keys/trud-public-key-2013-04-01.pgp",
:publicKeyFileName "trud-public-key-2013-04-01.pgp",
:archiveFileUrl "https://isd.digital.nhs.uk/api/v1/keys/7daa48e2a26f3afeef6f6c2a2feb00b62bcbe68b/files/ODS/1.0.0/HSCORGREFDATAXML_DATA/hscorgrefdataxml_data_1.0.0_20210129000001.zip",
:archiveFileSizeBytes 26688464,
:id "hscorgrefdataxml_data_1.0.0_20210129000001.zip",
:signatureFileUrl "https://isd.digital.nhs.uk/api/v1/keys/7daa48e2a26f3afeef6f6c2a2feb00b62bcbe68b/files/ODS/1.0.0/HSCORGREFDATAXML_DATA/trud_hscorgrefdataxml_data_1.0.0_20210129000001.xml.asc",
:checksumFileUrl "https://isd.digital.nhs.uk/api/v1/keys/7daa48e2a26f3afeef6f6c2a2feb00b62bcbe68b/files/ODS/1.0.0/HSCORGREFDATAXML_DATA/trud_hscorgrefdataxml_data_1.0.0_20210129000001.xml",
:publicKeyId 6,
:signatureFileSizeBytes 488,
:archiveFileName "hscorgrefdataxml_data_1.0.0_20210129000001.zip",
:needsUpdate? true,
:archiveFilePath #object[sun.nio.fs.UnixPath
0xdde6cc8
"/tmp/trud/341--2021-01-29--hscorgrefdataxml_data_1.0.0_20210129000001.zip"]}
Once you have the zip file, you can unzip to a temporary directory and
process, as necessary For convenience, you can use the utility functions in
com.eldrix.trud.zip
.
Here we are looking at the NHS ODS XML distribution, which always contains two nested zip files "archive.zip" and "fullfile.zip". Here we extract any .xml files using a regexp in our nested query:
(require '[com.eldrix.trud.zip :refer [unzip2 delete-paths]])
(def ods-xml-files [(:archiveFilePath latest)
["archive.zip" #"\w+.xml"]
["fullfile.zip" #"\w+.xml"]])
(def results (unzip2 ods-xml-files))
(get-in results [1 1]) ;; sequence of any XML files in archive zip
(get-in results [2 1]) ;; sequence of any XML files in fullfile.zip
(delete-paths results)
Identify outdated dependencies:
clj -M:outdated
Run compilation checks
clj -M:check
Run linting
clj -M:lint/eastwood
clj -M:lint/kondo
To build a library jar and publish to local maven repository:
clj -T:build install
To build a library jar and publish to clojars
clj -T:build deploy
The Circle CI badge indicates the results of automated tests - including a live test against the NHS Digital TRUD service. If these tests fail, it may be because the service is down or there has been a breaking versionc change.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close