new-vector-index
function creates an on-disk index for equal-length
dense numeric vectors to allow similarity search, and related functions
close-vector-index
, clear-vector-index
and vector-index-info
. Vector
indexing is implemented with usearch.add-vec
, remove-vec
, and search-vec
functions to
work with vector index. Similar to full-text search, vector search also
support domain semantics to allow grouping of vectors into domains, see
doc.:db.type/vec
, for which a vector index is
automatically created for them to allow a query function vec-neighbors
to
return the datoms with neighboring vector values.
#145datalevin.constants/range-count-time-budget
in milliseconds, default
is 10; another dynamic var datalevin.constants/range-count-iteration-step
determines after how many loop iterations to take a time measure, default
is 1000.sample-time-budget
and
sample-iteratioin-step
, defaults are 2 milliseconds and 20000, respectively.:parsing-time
for query parsing and :building-time
for query
graph building to explain
results.re-index
to save memory.datalevin.constants/lmdb-sync-interval
in seconds, default is 300. This also
cleans up dead readers.pmap
and future
for parallel read operation, as they use
unbounded thread pools. User can use a semaphore to limit the number of read
threads in flight, or use a bounded thread pool. If needed, :max-readers
KV
option can also be set to increase the limit (default is now doubled to
256).built-ins
namespace for query functions to cljdoc.analyze
function to collect statistics that helps with query
planner.max-eid
function to return the current maximal entity id.and
in or-join
exception
#304and
join exception.
#305seek-datom
and rseek-datom
broken for :eav
indexfill-db
no longer creates a new DB, to reduce chance of user
errors. #306seek-datoms
and rseek-datoms
to specify the
number of datoms desired. [Thx @jeremy302]
#312datoms
function takes an extra n
argument as well.set-env-flags
and get-env-flags
, so users may change the env flags
after the DB is open. This is useful for adjusting transaction durability
settings on the fly.java.util.SequencedCollection
automatically
when compiled on JDK 21 and above, which does not exist in earlier JDK.
Compile library with JDK 17 instead.:bytes
.0.05
.transanct-kv-async
function to return a future and transact in batches
to enhance write throughput (2.5X or more higher throughput in heavy write
condition compared with transact-kv
). Batch size is automatically adjusted
to the write workload: the higher the load, the larger the batch. Howver,
manual batching of transaction data is still helpful, as the effects of
automatic batching and manual batching add up.
#256transact
and transact-async
are also changed to use the
same adaptive batching transaction mechanism to enhance write throughout.sample-processing-interval
to 10 seconds, so
samples are more up to date. Each invocation will do less work, or no work
, based on a change ratio, controlled by a dynamic var sample-change-ratio
,
default is 0.1
, i.e. resample if 10 percent of an attribute's values
changed.--add-opens
JVM options to open modules for
Java 11 and above. If these JVM options are not set, Datalevin will use a
safer but slower default option instead of throwing exceptions. It is still
recommended to add these JVM options to get optimal performance. For now,
native image uses only the safer option.get-first-n
to return first n
key-values in a key range.list-range-first-n
to return first n
key-values in a key-value range
of a list DBI. #298init-db
, fill-db
, re-index
and load
by writing in
:nosync
mode and syncing only at the end.c/plan-space-reduction-threshold
, always use P(n, 3)
instead, as only initial 2 joins have accurate size estimation, so larger plan
space for later steps is not really beneficial. This change improves JOB.entity
behavior in pod as in JVM.
#283:offset 0
.empty
on Datom so it can be walked.
#286:aot
to avoid potential dependency conflict.:offset
and :limit
support,
#126,
#117:order-by
support,
#116count-datoms
function to return the number of datoms of a patterncardinality
function to return the number of unique values of an
attributeq/*cache?*
to
turn it off.sync
function to force a synchronous flush to disk, useful
when non-default flags for writes are used.clear
to bb pod.get-conn
[Thx @aldebogdanov]like
function failed to match in certain cases.clear
function also clear the meta DBIinit-exec-size-threshold
(default 1000),
above which, the same number of samples are collected instead. These
significantly improved subsequent join size estimation, as these initial steps
hugely impact the final plan.plan-space-reduction-threshold
(default 990),
then greedy search is performed in later stages, as these later ones have less
impact on performance. This provides a good balance between planning time and
plan quality, while avoiding potential out of memory issue during planning.sample-processing-interval
(default 3600 seconds).*fill-db-batch-size*
to 1 million datoms.nil
, #267like
, in
within complex logic expressions.not
, and
and or
logic functions that involve only
one variable.:result
to explain
result map.like
function similar to LIKE operator in SQL: (like input pattern)
or (like input pattern opts)
. Match pattern accepts wildcards %
and _
. opts
map has key :escape
that takes an escape character, default
is \!
. Pattern is compiled into a finite state machine that does non-greedy
(lazy) matching, as oppose to the default in Clojure/Java regex. This function
is further optimized by rewritten into index scan range boundary for patterns
that have non-wildcard prefix. Similarly, not-like
function is provided.in
function that is similar to IN operator in SQL: (in input coll)
which is optimized as index scan boundaries. Similarly, not-in
.fill-db
function to bulk-load a collection of trusted datoms,
*fill-db-batch-size*
dynamic var to control the batch size (default 4
million datoms). The same var also control init-db
batch size.read-csv
function, a drop-in replacement for clojure.data.csv/read-csv
.
This CSV parser is about 1.5X faster and is more robust in handling quoted
content.write-csv
for completeness.min
and max
query predicates handle all comparable data.explain
throws when zero result is determined prior to actual
planning. [Thx @aldebogdanov]explain
result map.explain
function to show query plan.DB Upgrade is required.
search-datoms
function to lookup datoms without having to specify
an index.open-list-dbi
:
put-list-items
del-list-items
visit-list
get-list
list-count
key-range-list-count
in-list?
list-range
list-range-count
list-range-filter
list-range-first
list-range-some
list-range-keep
list-range-filter-count
visit-list-range
operate-list-val-range
key-range
function that returns a range of keys only.key-range-count
function that returns the number of keys in a range.visit-key-range
function that visit keys in a range for side effects.range-some
function that is similar to some
for a given range.range-keep
function that is similar to keep
for a given range.:eavt
, :avet
and :vaet
are no longer accepted as index names,
use :eav
, :ave
and :vae
instead. Otherwise, it's misleading, as we don't
store tx id.:mapasync
to :nometasync
, so
that the database is more crash resilient. In case of system crash, only the
last transaction might be lost, but the database will not be corrupted. #228datalevin/kv-info
dbi to keep meta information about the databases, as
well as information about each dbi, as flags, key-size, etc. #184raw-pred?
to
indicate whether the predicate takes a raw KV object (default), or a pair of
decoded values of k and v (more convenient).search-utils
namespace are now compiled instead of
being interpreted to improve performance.:closed-schema?
option to allow declared attributes only, default
is false
. [Thx @andersmurphy]:validate-data? true
not working for some data types. [Thx @andersmurphy]:db.fulltext/autoDomain
boolean property to attribute schema,
default is false
. When true
, a search domain
specific for this attribute will be created, with a domain name same as
attribute name, e.g. "my/attribute". This enables the same fulltext
function syntax as
Datomic, i.e. (fulltext $ :my/attribute ?search)
.:search-opts
option to new-search-engine
option argument,
specifying default options passed to search
function.:db.fulltext/domains
property to attribute schema, #176:search-domains
to connection option map, a map from domain
names to search engine option maps.:domains
option to fulltext
built-in function option map<
, >
, <=
, >=
built-in functions handle any comparable data, not just numbers.:xform
in pull expression not called for :cardinality/one
ref
attributes, #224. [Thx @dvingo]:validate-data?
does not recognize homogeneous tuple data type, #227.--nippy
option to dump/load database in nippy binary
format, which handles some data anomalies, e.g. keywords with space in
them, non-printable data, etc., and produces smaller dump file, #216dtlv-re-index-<unix-timestamp>
inside the
system temp directory when re-index
, #213:index-position?
search
engine option is true
. #203:proximity-expansion
search option (default 2
) can be used to
adjust the search quality vs. time trade-off: the bigger the number, the
higher is the quality, but the longer is the search time.:proximity-max-dist
search option (default 45
) can be used to
control the maximal distance between terms that would still be considered as
belonging to the same span.create-stemming-token-filter
function to create stemmers, which
uses Snowball stemming library that supports many languages. #209create-stop-words-token-filter
function to take a customized stop
words predicate.re-index
function that dump and load data with new
settings. Should only be called when no other threads or programs are
accessing the database. #179*datalevin-data-readers*
dynamic var, use Clojure's
*data-readers*
instead.:tx-data
when unchanged. #207open-kv
, don't grow :mapsize
when it is the same as the current
size.:include-text?
option to store original text. #178.:texts
and :texts+offsets
keys to :display
option of search
function, to return original text in search results.dump
and load
of Datalog DB on Windows.:max-readers
option to specify the maximal number of concurrent readers
allowed for the db file. Default is 126.max-dbs
option to specify the maximal number of sub-databases (DBI)
allowed for the db file. Default is 128. It may induce slowness if too big a
number of DBIs are created, as a linear scan is used to look up a DBI.clear
after db is resized.:last-active
.datalog-index-cache-limit
function to get/set the limit of Datalog
index cache. Helpful to disable cache when bulk transacting data. #195:idle-timeout
option when creating the server, in ms, default is 24
hours. #122:db/tupleTypes
and homogeneous tuples
:db/tupleType
type. Unlike Datomic, the number of elements in a tuple are
not limited to 8, as long as they fit inside a 496 bytes buffer. In addition,
instead of using nil
to indicate minimal value like in Datomic, one can use
:db.value/sysMin
or :db.value/sysMax
to indicate minimal or maximal
values, useful for range queries. #167*datalevin-data-readers*
to support loading custom tag
literals. (thx @respatialized):db/fulltext
value is added then removed in the
same transaction.search-utils/create-ngram-token-filter
now works. #164:db/fulltext
values transaction error. #177DB Upgrade is required.
remove-doc
and 10X disk space reduction; when term positions and offsets are
indexed: 3X faster bulk load and 40 percent space reduction.:index-position?
option to indicate whether to record term
positions inside documents, default false
.:check-exist?
argument to add-doc
indicate whether to check the
existence of the document in the index, default true
. Set it to false
when
importing data to improve ingestion speed.:db/fulltext
values. #151doc-refs
function.search-index-writer
as well as related write
and
commit
functions for client/server, as it makes little sense to bulk load
documents across network.transact!
inside with-transaction
to ensure ACID and improved performanceget-range
regression when results are used in sequence
. #172create-database
instead of being created by opening a connection URIdb-name
is unique on the server. (thx @dvingo)(random-uuid)
, since not every one is on Clojure 1.11 yet.DB Upgrade is required.
range-seq
that has similar signature as
get-range
, but returns a Seqable
, which lazily reads data items into
memory in batches (controlled by :batch-size
option). It should be used
inside with-open
for proper cleanup. #108get-range
and range-filter
, now
automatically spill to disk when memory pressure is high. The results, though
mutable, still implement IPersistentVector
, so there is no API level
change. The spill-to-disk behavior is controlled by spill-opts
option map
when opening the db, allowing :spill-threshold
and :spill-root
options.:client-opts
option map that is passed to the client when opening remote databases.with-transaction-kv
does not drop prior data when DB is resizing.with-transaction
does not drop prior data when DB is resizing.with-transaction-kv
does not crash when DB is resizing.with-transaction-kv
macro to expose explicit transactions for KV
database. This allows arbitrary code within a transaction to achieve
atomicity, e.g. to implement compare-and-swap semantics, etc, #110with-transaction
macro, the same as the above for Datalog databaseabort-transact-kv
function to rollback writes from within an explicit KV transaction.abort-transact
function, same for Datalog transaction.visit
functionfulltext-datoms
function that return datoms found by full
text search query, #157nil
instead, #158entity
and touch
function to babashka pod, these return regular
maps, as the Entity
type does not exist in a babashka script. #148 (thx
@ngrunwald):timeout
option to terminate on deadline for query/pull. #150 (thx
@cgrand).max-tx
, #142tx-data->simulated-report
to obtain a transaction report without actually persisting the changes. (thx @TheExGenesis):bigint
and :bigdec
data types, corresponding to
java.math.BigInteger
and java.math.BigDecimal
, respectively.:db.type/bigdec
and :db.type/bigint
, correspondingly, #138.update-schema
to allow renaming attributes. #131clear-docs
function to wipe out search index, as it might be faster
to rebuild search index than updating individual documents sometimes. #132datalevin.constants/*data-serializable-classes*
dynamic var, which can be
used for binding
if additional Java classes are to be serialized as part of
the default :data
data type. #134:kv-opts
to underlying KV store when create-conn
clear
function on server. #133:search-engine
option map key to :search-opts
for consistency [Breaking]visit
can return a special value :datalevin/terminate-visit
to stop the visit.open-dbi
signature to take an option map instead:validate-data?
option for open-dbi
, create-conn
etc., #121:domain
option to new-search-engine
, so multiple search engines can
coexist in the same dir
, each with its own domain, a string. #112update-schema
to allow removal of attributes if they are
not associated with any datoms, #99:db/updated-at
datom, #113datalevin.search-utils
namespace with some utility functions to customize
search, #105 (thx @ngrunwald)visit
KV function to core
name spaceadd-doc
for the same doc refinter-fn
:auto-entity-time?
Datalog DB creation option, so entities can optionally have
:db/created-at
and :db/updated-at
values added and maintained
automatically by the system during transaction, #86doc-count
function returns the number of documents in the search indexdoc-refs
function returns a seq of doc-ref
in the search indexdatalevin.core/copy
function can copy Datalog database directly.doc-indexed?
functionadd-doc
can update existing docopen-kv
function allows LMDB flags, #100DB Upgrade is required.
visit
function to do arbitrary things upon seeing a
value in a range:instant
handles dates before 1970 correctly, #94. The storage
format of :instant
type has been changed. For existing Datalog DB containing
:db.type/instant
, dumping as a Datalog DB using the old version of dtlv, then
loading the data is required; For existing key-value DB containing :instant
type, specify :instant-pre-06
instead to read the data back in, then write
them out as :instant
to upgrade to the current format.defpodfn
so it works in non-JVM.load-edn
for dtlv, useful for e.g. loading schema from a file, #101defpodfn
macro to define a query function that can be used in babashka pod, #85max-aid
after schema update (thx @den1k)disconnect
message is received, clean up
resources afterwards, so a logically correct number of clients can be obtained
in the next API call on slow machines.create-conn
should override the old, fix #65DTLV_LIB_EXTRACT_DIR
environment variable to allow customization of native
libraries extraction location.org.clojars.huahaiy/datalevin-native
org.clojars.huahaiy/datalevin-native
on clojars, for
depending on Datalevin while compiling GraalVM native image. User
no longer needs to manually compile Datalevin C libraries.db/-search
or db/-datoms
with
cheaper calls to improve remote store access speed.DB Upgrade is required.
dtlv exec
takes input from stdin when no argument is given.clear
function to clear Datalog dbdatom-eav
, datom-e
, etc.close-db
convenience function to close a Datalog dbdatalevin.core
, so users don't have to understand and require different namespaces in order to use all features.DB Upgrade is required.
open-lmdb
and close-lmdb
to open-kv
and close-kv
,
lmdb/transact
to lmdb/transact-kv
, so they are consistent, easier to
remember, and distinct from functions in datalevin.core
.dtlv
[(.getTime ?date) ?timestamp]
, [(.after ?date1 ?date2)]
, where the date variables are :db.type/instance
. [#32]:db.type/instant
value as java.util.Date
, not as long
[#30]lmdb/open-lmdb
core/get-conn
schema updatecore/get-conn
and core/with-conn
init-max-eid
for large values as well.:data
type. [#23]core/empty-db
:db/ident
in implicit schemacore/create-conn
, db/empty-db
etc., and put dir
in front, since it is more likely to be specified than
schema
in real use, so users don't have to put nil
for schema
.core/update-schema
false
value as :data
core/schema
and core/update-schema
core/closed?
db/entid
allows 0 as eiddb/-first
instead of (first (db/-datom ..))
, db/-populated?
instead of (not-empty (db/-datoms ..)
, as they do not realize the results hence faster.bits/read-buffer
and bits/put-buffer
lmdb/closed?
, lmdb/clear-dbi
, and lmdb/drop-dbi
core/close
Can you improve this documentation? These fine people already did:
Huahai Yang & Jeroen van DijkEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close