H2 driven table vizualisation droid for the Clojure REPL.
H2D2 is a REPL utility that loads files, urls, core.matrix datasets/matrices and clojure data structures into an in-memory H2 database, for visualisation via your favourite SQL GUI.
H2D2 was developed with fully-featured IDEs in mind (e.g. IntelliJ IDEA) to provide a dataset viewing experience similar to Matlab, with added SQL benefits such as sorting and filtering/querying.
Once you've configured your SQL GUI to point to H2D2's database (which you just have to do once and for all), viewing a dataset is as straightforward as typing:
(h2d2 "mydata.csv")
(h2d2 "http://mydomain/mydata.csv")
(h2d2 mydataset)
(h2d2 mymatrix)
(h2d2 [[0 1 1] [0 1 0] [1 0 0]])
(h2d2 [{:name "Jill"} {:name "Jack"}])
H2D2 also provides utility functions to import CSV files and Clojure data structures into any H2 database.
H2D2 is ready to use and I'm inviting you to install the snapshot version and let me know how it goes. I'll use H2D2 a bit more and will write a few more tests before I give it a stable release version number.
Add the H2D2 library to your ~/lein/profiles.clj
or to your project.clj
:
[h2d2 "0.1.0-SNAPSHOT"]
Import the h2d2
function in your current namespace:
(use '[h2d2.main :only [h2d2]])
H2D2 will create its own in-memory database (called H2D2
), which will reside in /
be served from your REPL process; so closing the REPL will destroy the database.
Give a thought to what port your SQL GUI and H2D2's database will use to communicate. If you don't have an H2 server running on your machine already, H2D2 will start one on port 9092. However, if an H2 server is already running on port 9092 then H2D2 will select a new port automatically each time it starts. You can instruct H2D2 to use a port number of your choosing so that you don't have to reconfigure your GUI everytime; use a port in IANA's unregistered range 49152-65535, or perhaps in the 9094-9099 range which is part of IANA's registered range but is unassigned at the time of writing.
You may now setup a new datasource in your SQL GUI:
Database type: H2
Driver: download the latest H2 driver (e.g. h2-1.4.197.jar
)
via the binary link on the
H2 download page
or from Maven.
Connection URL: jdbc:h2:tcp://localhost:9092/mem:H2D2
.
This corresponds to:
localhost
9092
(default) or the port of your choosingmem:H2D2
In the clojure REPL, create the H2D2 in-memory database and start the H2 server by uploading a first dataset:
(h2d2 [{:name "Jill"} {:name "Jack"}])
Or, if you want to use a custom port:
(h2d2 [{:name "Jill"} {:name "Jack"}] :port 50000)
This will return a string that tells on which port number the database server was started. You can change this port at anytime:
(h2d2 :port 50001)
You can then view your data in your SQL GUI (it may need to be refreshed
so that the table appears in it), and you can add new datasets by
running the h2d2 function again. Once the data is in the H2D2 database,
you can run any SQL query on it and create new tables derived from it.
You can download the content of any table (e.g. sometable
) as a vector of maps
back in your REPL environment with the command:
(h2d2 :all :sometable)
H2D2 is all about database side-effects, and it is very much shell-like and imperative in style rather than functional.
Its arguments are slightly different for CSV files and for Clojure data structures.
H2D2 only needs the name or URL of the file to view (+ a list of
column names if and only if the file has no header).
All other arguments are optional (table name, separator,
comment marker, charset, port). If the table name is not given, the table
will be automatically named DATASET1
2
3
...
H2D2 uses H2's built-in CSV import facilities and therefore follows the way H2 does things.
(h2d2 "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv")
(h2d2 "testresources/artist_data.csv")
; specific table name
(h2d2 "testresources/artist_data.csv" :table :artists)
; specific charset to ensure proper character conversion
(h2d2 "testresources/artist_data.csv" :charset :utf-8)
; provide column names if (and only if) the CSV file doesn't have a header
(h2d2 "testresources/Traffic enforcement cameras noheader.csv"
:table :trafficcams
:columns [:locationcode :location :contraventioncode
:contravention :activesince :hours]
:charset :utf-8 )
; custom separator and comment line marker
(h2d2 "testresources/random-space.csv" :separator \space :comment \#)
; change the database server port in the same go
(h2d2 "testresources/random-tab.txt" :separator \tab :port 56789)
Column names should only be provided if the CSV file doesn't have a header. If column names are supplied then H2D2 will interpret the first row of the file as data; if not then H2D2 will interpret the first row of the file as header.
H2D2 only needs to be passed the data structure to view.
All other arguments are optional (table name, column names, data type,
charset, port). If the table name is not given, the table
will be automatically named DATASET1
2
3
...
H2D2 transforms matrices and data structures into core.matrix
datasets before
uploading them into H2. H2D2's behaviour (e.g. default column naming) will
therefore match the familiar behaviour of function
clojure.core.matrix.dataset/dataset
.
Unlike CSV files, it doesn't make a difference to the data if you supply column names or not.
; load a core.matrix dataset with default behaviour
(def ds (clojure.core.matrix.dataset/dataset [:a :b] (partition 2 (range 100))))
(h2d2 ds)
; load a core.matrix dataset with (all optional) table name, columns and type
(h2d2 ds
:table :mydataset
:columns [:id :value]
:type :int)
; load a core.matrix with (all optional) table name and type
(def m (clojure.core.matrix/diagonal-matrix (range 20)))
(h2d2 m
:table :mymatrix
:type :double)
; load a vector of vectors
(h2d2 [[1 "Jill"] [2 "Jack"] [3 "Joan"]]
:columns [:id :name]
:table :forenames)
; load a vector of maps
(h2d2 [{:id 1 :name "Jill"}
{:id 2 :name "Jack"}
{:id 3 :name "Joan"}])
By default, all data is uploaded as VARCHAR(MAX)
into the database.
you can convert the whole dataset to a different data type with the
:type
option, for example if your dataset is entirely numeric.
Type definitions follow
clojure.java.jdbc.
(h2d2 mydata :type :int)
(h2d2 mydata :type :double)
(h2d2 mydata :type "varchar(10)")
H2D2 favors ease of use over granularity, so it doesn't allow casting different columns to different types.
If the first argument passed to h2d2
is a keyword (or if it is called
without any argument) then h2d2
will execute
special actions.
(h2d2) ; return the status of the database and tcp server
(h2d2 :list) ; return the list of tables in the database
(h2d2 :all :dataset1) ; return the whole content of DATASET1
(h2d2 :top 10 :dataset1) ; return the top 10 rows in table DATASET1
(h2d2 :drop :dataset1) ; drop table DATASET1 from the database
(h2d2 :wipe) ; clear the database
(h2d2 :close) ; close the database and the tcp server
(h2d2 :port 56789) ; change the port of the H2 server
(h2d2 :tcp :stop) ; stop the H2 server
(h2d2 :tcp :start) ; start the H2 server
(h2d2 :db) ; return the spec of the h2d2 database
The h2d2 function will block your REPL until it has completed.
If you are loading a large file and you don't want to block
your REPL, you can use the function h2d2go
.
(use '[h2d2.main :only [h2d2 h2d2go]])
(h2d2go "very-large-file.csv")
This will run the upload in the background and print the result
message later. However, new uploads and deletions will block until
the background upload is over. So use h2d2go
again for new uploads
and deletions if the current background task hasn't completed yet.
; these will run synchronously, and will wait/block
; if any upload or deletion is currently running
(h2d2 "another-file.csv")
(h2d2 another-dataset)
(h2d2 :drop :some-table)
(h2d2 :wipe)
; these will run asynchronously, and will wait behind the scenes
; if any upload or deletion is currently running
(h2d2go "another-file.csv")
(h2d2go another-dataset)
(h2d2go :drop :some-table)
(h2d2go :wipe)
; these will run synchronously and execute at once
; whether any upload or deletion is currently or not
(h2d2)
(h2d2 :list)
(h2d2 :top 10 :some-table)
(h2d2 :close)
Note that background tasks started with h2d2go
will execute
in random order. h2d2go
just fires off a new thread that will
block until the current task has completed. h2d2go
is equivalent
to the following line of code, with a bit of exception catching on top:
(future (println (h2d2 some-data)))
Automated tests remain to be written for h2d2go
.
The h2d2
function strictly targets its own memory database. However, the namespace
h2d2.db
contains functions import-csv
and import-data
to import CSV files and
clojure data structures into any H2 database. Their arguments
are similar to the h2d2
function except for the following differences:
You need to provide the spec of your H2 database.
You need to provide an explicit table name as string (beware that the functions will attempt to delete any table that has this name already).
Whereas h2d2
returns status messages, these functions return the name of the table
as well as the number of inserted rows.
Calls are not synchronised, so two threads can run these functions simultaneously.
You'll need to write your own SQL to query your H2 database and the new tables.
The h2d2
accessory functions only work with the H2D2 in-memory database.
(def db {:classname "org.h2.Driver"
:subprotocol "h2"
:subname "~/my-data/my-h2-file"
:user ""
:password ""})
; Import a csv file. The presence of the :columns argument still dictates whether
; the first row of the CSV file should be interpreted as column names or not.
(h2d2.db/import-csv db
"testresources/Traffic enforcement cameras noheader.csv"
"TRAFFICCAMS"
:columns [:locationcode :location :contraventioncode
:contravention :activesince :hours]
:separator \,
:comment \#
:charset :utf-8 )
; Import a dataset or any other data structure.
(h2d2.db/import-data db
mydata
"MYDATA"
:columns [:id :value]
:type :int)
The Software includes copies of the following licensed datasets, which are
used to test the Software and are located in the testresources
directory:
Traffic enforcement cameras.csv Copyright © Crown Copyright, licensed under the Open Government Licence v3.0, copied 30 March 2018
artist_data.csv Copyright (c) Tate, licensed under the Creative Commons Zero 1.0 Public Domain Dedication, copied 30 March 2018
Copyright © 2018 Nicolas Duchenne, Belove Ltd, London, UK
Released under the MIT License.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close