tmducken.duckdb

Liking cljdoc? Tell your friends :D

Clojure only.

close-db
connect
create-table!
datasets->dataset
disconnect
drop-table!
get-config-options
initialize!
initialized?
insert-dataset!
open-db
prepare
sql->dataset
sql->datasets

DuckDB C-level bindings for tech.ml.dataset.

Current datatype support:

boolean, all numeric types int8->int64, uint8->uint64, float32, float64.
string
LocalDate, Instant column types.

Example:


user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tmducken.duckdb :as duckdb])
nil
user> (duckdb/initialize!)
10:04:14.814 [nREPL-session-635e9bc8-2923-442b-9fad-da547210617b] INFO tmducken.duckdb - Attempting to load duckdb from "/home/chrisn/dev/cnuernber/tmducken/binaries/libduckdb.so"
true
user> (def stocks
      (-> (ds/->dataset "https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv" {:key-fn keyword})
          (vary-meta assoc :name :stocks)))
#'user/stocks
user> (def db (duckdb/open-db))
#'user/db
user> (def conn (duckdb/connect db))
#'user/conn
user> (duckdb/create-table! conn stocks)
"stocks"
user> (duckdb/insert-dataset! conn stocks)
nil
user> (ds/head (duckdb/sql->dataset conn "select * from stocks"))

_unnamed [5 3]:

| symbol |       date | price |
|--------|------------|------:|
|   MSFT | 2000-01-01 | 39.81 |
|   MSFT | 2000-02-01 | 36.35 |
|   MSFT | 2000-03-01 | 43.22 |
|   MSFT | 2000-04-01 | 28.37 |
|   MSFT | 2000-05-01 | 25.45 |

DuckDB C-level bindings for tech.ml.dataset.

  Current datatype support:

  * boolean, all numeric types int8->int64, uint8->uint64, float32, float64.
  * string
  * LocalDate, Instant column types.


  Example:

  ```clojure

user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tmducken.duckdb :as duckdb])
nil
user> (duckdb/initialize!)
10:04:14.814 [nREPL-session-635e9bc8-2923-442b-9fad-da547210617b] INFO tmducken.duckdb - Attempting to load duckdb from "/home/chrisn/dev/cnuernber/tmducken/binaries/libduckdb.so"
true
user> (def stocks
        (-> (ds/->dataset "https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv" {:key-fn keyword})
            (vary-meta assoc :name :stocks)))
#'user/stocks
user> (def db (duckdb/open-db))
#'user/db
user> (def conn (duckdb/connect db))
#'user/conn
user> (duckdb/create-table! conn stocks)
"stocks"
  user> (duckdb/insert-dataset! conn stocks)
nil
  user> (ds/head (duckdb/sql->dataset conn "select * from stocks"))

_unnamed [5 3]:

| symbol |       date | price |
|--------|------------|------:|
|   MSFT | 2000-01-01 | 39.81 |
|   MSFT | 2000-02-01 | 36.35 |
|   MSFT | 2000-03-01 | 43.22 |
|   MSFT | 2000-04-01 | 28.37 |
|   MSFT | 2000-05-01 | 25.45 |
```

raw docstring

close-db^clj

(close-db db)

Close the database.

Close the database.

raw docstring

connect^clj

(connect db)

Create a new database connection from an opened database. Users should call disconnect to close this connection.

Create a new database connection from an opened database.
Users should call disconnect to close this connection.

raw docstring

create-table!^clj

(create-table! conn dataset)

(create-table! conn dataset options)

Create an sql table based off of the column datatypes of the dataset. Note that users can also call [[execute-query!]] with their own sql create-table string. Note that the fastest way to get data into the system is [[append-dataset!]].

Options:

:table-name - Name of the table to create. If not supplied the dataset name will be used.
:primary-key - sequence of column names to be used as the primary key.

Create an sql table based off of the column datatypes of the dataset.  Note that users
can also call [[execute-query!]] with their own sql create-table string.  Note that the
fastest way to get data into the system is [[append-dataset!]].

Options:

* `:table-name` - Name of the table to create.  If not supplied the dataset name will
   be used.
* `:primary-key` - sequence of column names to be used as the primary key.

raw docstring

datasets->dataset^clj

(datasets->dataset results)

Given a sequence of results return a single dataset. This pathway relies on reduce-type being anything other than :zero-copy-imm. It is designed for and is most efficient when used with :zero-copy.

Given a sequence of results return a single dataset.  This pathway relies on reduce-type being anything
other than `:zero-copy-imm`.  It is designed for and is most efficient when used with `:zero-copy`.

raw docstring

disconnect^clj

(disconnect conn)

Disconnect a connection.

Disconnect a connection.

raw docstring

drop-table!^clj

(drop-table! conn dataset)

get-config-options^clj

(get-config-options)

Returns a sequence of maps of {:name :desc} describing valid valid configuration options to the open-db function.

Returns a sequence of maps of {:name :desc} describing valid valid configuration
options to the open-db function.

raw docstring

initialize!^clj

(initialize!)

(initialize! {:keys [duckdb-home]})

Initialize the duckdb ffi system. This must be called first should be called only once. It is safe, however, to call this multiple times.

Options:

:duckdb-home - Directory in which to find the duckdb shared library. Users can pass this in. If not passed in, then the environment variable DUCKDB_HOME is checked. If neither is passed in then the library will be searched in the normal system library paths.

Initialize the duckdb ffi system.  This must be called first should be called only once.
It is safe, however, to call this multiple times.

Options:

* `:duckdb-home` - Directory in which to find the duckdb shared library.  Users can pass
this in.  If not passed in, then the environment variable `DUCKDB_HOME` is checked.  If
neither is passed in then the library will be searched in the normal system library
paths.

raw docstring

initialized?^clj

(initialized?)

insert-dataset!^clj

(insert-dataset! conn dataset)

(insert-dataset! conn dataset options)

Append this dataset using the higher performance append api of duckdb. This is recommended as opposed to using sql statements or prepared statements. That being said the schema of this dataset must match precisely the schema of the target table.

Append this dataset using the higher performance append api of duckdb.  This is recommended
as opposed to using sql statements or prepared statements.  That being said the schema of this
dataset must match *precisely* the schema of the target table.

raw docstring

open-db^clj

(open-db)

(open-db path)

(open-db path config-options)

Open a database. path may be nil in which case database is opened in-memory. For valid config options call get-config-options. Options must be passed as a map of string->string. As duckdb is dynamically linked configuration options may change but with linux-amd64-0.3.1 current options are:

tmducken.duckdb> (get-config-options)
[{:name "access_mode",
  :desc "Access mode of the database ([AUTOMATIC], READ_ONLY or READ_WRITE)"}
 {:name "default_order",
  :desc "The order type used when none is specified ([ASC] or DESC)"}
 {:name "default_null_order",
  :desc "Null ordering used when none is specified ([NULLS_FIRST] or NULLS_LAST)"}
 {:name "enable_external_access",
  :desc
  "Allow the database to access external state (through e.g. COPY TO/FROM, CSV readers, pandas replacement scans, etc)"}
 {:name "enable_object_cache",
  :desc "Whether or not object cache is used to cache e.g. Parquet metadata"}
 {:name "max_memory", :desc "The maximum memory of the system (e.g. 1GB)"}
 {:name "threads", :desc "The number of total threads used by the system"}]

Open a database.  `path` may be nil in which case database is opened in-memory.
  For valid config options call [[get-config-options]].  Options must be
  passed as a map of string->string.  As duckdb is dynamically linked configuration options
  may change but with `linux-amd64-0.3.1` current options are:

```clojure
tmducken.duckdb> (get-config-options)
[{:name "access_mode",
  :desc "Access mode of the database ([AUTOMATIC], READ_ONLY or READ_WRITE)"}
 {:name "default_order",
  :desc "The order type used when none is specified ([ASC] or DESC)"}
 {:name "default_null_order",
  :desc "Null ordering used when none is specified ([NULLS_FIRST] or NULLS_LAST)"}
 {:name "enable_external_access",
  :desc
  "Allow the database to access external state (through e.g. COPY TO/FROM, CSV readers, pandas replacement scans, etc)"}
 {:name "enable_object_cache",
  :desc "Whether or not object cache is used to cache e.g. Parquet metadata"}
 {:name "max_memory", :desc "The maximum memory of the system (e.g. 1GB)"}
 {:name "threads", :desc "The number of total threads used by the system"}]
```

raw docstring

prepare^clj

(prepare conn sql)

(prepare conn sql options)

Create a prepared statement returning a clojure function you can call taking args specified in the prepared statement. This function is auto-closeable which releases the prepared statement. . The function return value can be either a sequence of datasets or a single dataset. For :streaming, the sequence is read from the result and has no count. For :realized the sequence is of known length and the result is completely realized before the first dataset is read. Finally you can have single which means the system will return a single dataset.

In the cases where a sequence is returned, the object returned is auto-closeable and the query result itself will be released when either the sequence is exhausted or the return value is closed.

In general datasets are copied into the JVM on a chunk-by-chunk basis. If the user simply desires to reduce over the return value the datasets are zero-copied during the reduction with an option to immediately release each dataset.

The prepared statement is both an IFn and AutoCloseable. The return value of the IFn is AutoCloseable and does in fact need to be closed.

Options are passed through to dataset creation.

Options:

:result-type - one of `#{:streaming :realized :single}.
- :streaming - uncountable supplier/sequence of datasets - auto-closeable.
- :realized - all results realized, countable supplier/sequence of datasets - auto-closeable.
- :single - results realized into a single dataset with chunks and result being immediately released.
:reduce-type - One of #{:clone :zero-copy-imm :zero-copy} defaulting to :clone. - When the result is reduced the dataset is initially read via zero-copy directly from the result batch. Then one of three things happen:
- :clone - dataset cloned and batch released just before rf - safest option and default.
- :zero-copy-imm - rf called with zero-copy dataset and batch released just after. This is very memory and cpu efficient but you need to ensure that no part of the dataset escapes rf.
- :zero-copy - Datasets are merely passed to rf and batch is not released but is registered with the resource system. This is used to efficiently concatenate the results into one dataset after which all batches are released.

Examples:

user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :single})]
        (stmt))
_unnamed [560 3]:

| symbol |       date |  price |
|--------|------------|-------:|
|   MSFT | 2000-01-01 |  39.81 |
|   MSFT | 2000-02-01 |  36.35 |
|   MSFT | 2000-03-01 |  43.22 |
|   MSFT | 2000-04-01 |  28.37 |
|   MSFT | 2000-05-01 |  25.45 |
|   MSFT | 2000-06-01 |  32.54 |
  ...

user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming})]
        (stmt))
#object[tmducken.duckdb.StreamingResultChunks 0x41912865 "tmducken.duckdb.StreamingResultChunks@41912865"]
user> (seq *1)
(_unnamed [560 3]:

| symbol |       date |  price |
|--------|------------|-------:|
|   MSFT | 2000-01-01 |  39.81 |
|   MSFT | 2000-02-01 |  36.35 |
|   MSFT | 2000-03-01 |  43.22 |
|   MSFT | 2000-04-01 |  28.37 |
|   MSFT | 2000-05-01 |  25.45 |


user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming
                                                                    :reduce-type :zero-copy})]
        (resource/stack-resource-context (reduce (fn [acc ds] (+ acc (ds/row-count ds))) 0 (stmt))))
560
user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming
                                                                    :reduce-type :zero-copy-imm})]
        (reduce (fn [acc ds] (+ acc (ds/row-count ds))) 0 (stmt)))
560
user> ;;BAD IDEA - dataset backing store is released before result is returned.
user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming
                                                                    :reduce-type :zero-copy-imm})]
        (reduce conj [] (stmt)))

Create a prepared statement returning a clojure function you can call taking args specified
  in the prepared statement.  This function is auto-closeable which releases the prepared statement.
.
  The function return value can be either a sequence of datasets or a single dataset.  For `:streaming`, the sequence
  is read from the result and has no count.  For `:realized` the sequence is of known length and the result
  is completely realized before the first dataset is read.  Finally you can have `single` which means
  the system will return a single dataset.

  In the cases where a sequence is returned, the object returned is auto-closeable and the query result itself
  will be released when either the sequence is exhausted or the return value is closed.

  In general datasets are copied into the JVM on a chunk-by-chunk basis.  If the user simply desires to reduce over
  the return value the datasets are zero-copied during the reduction with an option to immediately release each dataset.

  The prepared statement is both an IFn and AutoCloseable.  The return value of the IFn is AutoCloseable and does in fact
  need to be closed.

  Options are passed through to dataset creation.

  Options:
  * `:result-type` - one of `#{:streaming :realized :single}.
     - `:streaming` - uncountable supplier/sequence of datasets - auto-closeable.
     - `:realized` - all results realized, countable supplier/sequence of datasets - auto-closeable.
     - `:single` - results realized into a single dataset with chunks and result being immediately released.
  * `:reduce-type - One of #{:clone :zero-copy-imm :zero-copy}` defaulting to `:clone`.  - When the result is
     reduced the dataset is initially read via zero-copy directly from the result batch.  Then one of three things happen:
     - `:clone` - dataset cloned and batch released just before rf - safest option and default.
     - `:zero-copy-imm` - rf called with zero-copy dataset and batch released just after.  This is very memory and cpu efficient
       but you need to ensure that no part of the dataset escapes rf.
     - `:zero-copy` - Datasets are merely passed to rf and batch is not released but is registered with the resource system.  This
     is used to efficiently concatenate the results into one dataset after which all batches are released.


  Examples:

```clojure
user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :single})]
        (stmt))
_unnamed [560 3]:

| symbol |       date |  price |
|--------|------------|-------:|
|   MSFT | 2000-01-01 |  39.81 |
|   MSFT | 2000-02-01 |  36.35 |
|   MSFT | 2000-03-01 |  43.22 |
|   MSFT | 2000-04-01 |  28.37 |
|   MSFT | 2000-05-01 |  25.45 |
|   MSFT | 2000-06-01 |  32.54 |
  ...

user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming})]
        (stmt))
#object[tmducken.duckdb.StreamingResultChunks 0x41912865 "tmducken.duckdb.StreamingResultChunks@41912865"]
user> (seq *1)
(_unnamed [560 3]:

| symbol |       date |  price |
|--------|------------|-------:|
|   MSFT | 2000-01-01 |  39.81 |
|   MSFT | 2000-02-01 |  36.35 |
|   MSFT | 2000-03-01 |  43.22 |
|   MSFT | 2000-04-01 |  28.37 |
|   MSFT | 2000-05-01 |  25.45 |


user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming
                                                                    :reduce-type :zero-copy})]
        (resource/stack-resource-context (reduce (fn [acc ds] (+ acc (ds/row-count ds))) 0 (stmt))))
560
user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming
                                                                    :reduce-type :zero-copy-imm})]
        (reduce (fn [acc ds] (+ acc (ds/row-count ds))) 0 (stmt)))
560
user> ;;BAD IDEA - dataset backing store is released before result is returned.
user> (with-open [stmt (duckdb/prepare conn "select * from stocks" {:result-type :streaming
                                                                    :reduce-type :zero-copy-imm})]
        (reduce conj [] (stmt)))
```

raw docstring

sql->dataset^clj

(sql->dataset conn sql)

(sql->dataset conn sql options)

Execute a query returning a single dataset. This runs the query in a context that releases the memory used for the result set before function returns returning a dataset that has no native bindings.

Execute a query returning a single dataset.  This runs the query in a context that releases the memory used
for the result set before function returns returning a dataset that has no native bindings.

raw docstring

sql->datasets^clj

(sql->datasets conn sql)

(sql->datasets conn sql options)

Execute a query returning either a sequence of datasets or a single dataset.

See documentation and options for prepare.

Examples:

tmducken.duckdb> (first (sql->datasets conn "select * from stocks"))
_unnamed [560 3]:

| symbol |       date | price |
|--------|------------|------:|
|   MSFT | 2000-01-01 | 39.81 |
|   MSFT | 2000-02-01 | 36.35 |
|   MSFT | 2000-03-01 | 43.22 |
|   MSFT | 2000-04-01 | 28.37 |
|   MSFT | 2000-05-01 | 25.45 |
|   MSFT | 2000-06-01 | 32.54 |
|   MSFT | 2000-07-01 | 28.40 |



tmducken.duckdb> (ds/head (sql->dataset conn "select * from stocks"))
_unnamed [5 3]:

| symbol |       date | price |
|--------|------------|------:|
|   MSFT | 2000-01-01 | 39.81 |
|   MSFT | 2000-02-01 | 36.35 |
|   MSFT | 2000-03-01 | 43.22 |
|   MSFT | 2000-04-01 | 28.37 |
|   MSFT | 2000-05-01 | 25.45 |

Execute a query returning either a sequence of datasets or a single dataset.

  See documentation and options for [[prepare]].

  Examples:

```clojure
tmducken.duckdb> (first (sql->datasets conn "select * from stocks"))
_unnamed [560 3]:

| symbol |       date | price |
|--------|------------|------:|
|   MSFT | 2000-01-01 | 39.81 |
|   MSFT | 2000-02-01 | 36.35 |
|   MSFT | 2000-03-01 | 43.22 |
|   MSFT | 2000-04-01 | 28.37 |
|   MSFT | 2000-05-01 | 25.45 |
|   MSFT | 2000-06-01 | 32.54 |
|   MSFT | 2000-07-01 | 28.40 |



tmducken.duckdb> (ds/head (sql->dataset conn "select * from stocks"))
_unnamed [5 3]:

| symbol |       date | price |
|--------|------------|------:|
|   MSFT | 2000-01-01 | 39.81 |
|   MSFT | 2000-02-01 | 36.35 |
|   MSFT | 2000-03-01 | 43.22 |
|   MSFT | 2000-04-01 | 28.37 |
|   MSFT | 2000-05-01 | 25.45 |
```

raw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

Keyboard shortcuts Report a problem cljdoc on GitHub

× close

tmducken.duckdb

close-dbclj

connectclj

create-table!clj

datasets->datasetclj

disconnectclj

drop-table!clj

get-config-optionsclj

initialize!clj

initialized?clj

insert-dataset!clj

open-dbclj

prepareclj

sql->datasetclj

sql->datasetsclj

close-db^clj

connect^clj

create-table!^clj

datasets->dataset^clj

disconnect^clj

drop-table!^clj

get-config-options^clj

initialize!^clj

initialized?^clj

insert-dataset!^clj

open-db^clj

prepare^clj

sql->dataset^clj

sql->datasets^clj