VERSION file, in preparation for auto migration.new-vector-index function creates an on-disk index for equal-length
dense numeric vectors to allow similarity search, and related functions
close-vector-index, clear-vector-index and vector-index-info. Vector
indexing is implemented with usearch.add-vec, remove-vec, and search-vec functions to
work with vector index. Similar to full-text search, vector search also
support domain semantics to allow grouping of vectors into domains.:db.type/vec, for which a vector index is
automatically created for them to allow a query function vec-neighbors to
return the datoms with neighboring vector values.
#145datalevin.constants/range-count-time-budget in milliseconds, default
is 10; another dynamic var datalevin.constants/range-count-iteration-step
determines after how many loop iterations to take a time measure, default
is 1000.sample-time-budget and
sample-iteratioin-step, defaults are 2 milliseconds and 20000, respectively.:parsing-time for query parsing and :building-time for query
graph building to explain results.re-index to save memory.datalevin.constants/lmdb-sync-interval in seconds, default is 300. This also
cleans up dead readers.pmap and futurefor parallel read operation, as they use
unbounded thread pools. User can use a semaphore to limit the number of read
threads in flight, or use a bounded thread pool. If needed, :max-readers KV
option can also be set to increase the limit (default is now doubled to
256).built-ins namespace for query functions to cljdoc.analyze function to collect statistics that helps with query
planner.max-eid function to return the current maximal entity id.and in or-join exception
#304and join exception.
#305seek-datom and rseek-datom broken for :eav indexfill-db no longer creates a new DB, to reduce chance of user
errors. #306seek-datoms and rseek-datoms to specify the
number of datoms desired. [Thx @jeremy302]
#312datoms function takes an extra n argument as well.set-env-flags and get-env-flags, so users may change the env flags
after the DB is open. This is useful for adjusting transaction durability
settings on the fly.java.util.SequencedCollection automatically
when compiled on JDK 21 and above, which does not exist in earlier JDK.
Compile library with JDK 17 instead.:bytes.0.05.transanct-kv-async function to return a future and transact in batches
to enhance write throughput (2.5X or more higher throughput in heavy write
condition compared with transact-kv). Batch size is automatically adjusted
to the write workload: the higher the load, the larger the batch. Howver,
manual batching of transaction data is still helpful, as the effects of
automatic batching and manual batching add up.
#256transact and transact-async are also changed to use the
same adaptive batching transaction mechanism to enhance write throughout.sample-processing-interval to 10 seconds, so
samples are more up to date. Each invocation will do less work, or no work
, based on a change ratio, controlled by a dynamic var sample-change-ratio,
default is 0.1, i.e. resample if 10 percent of an attribute's values
changed.--add-opens JVM options to open modules for
Java 11 and above. If these JVM options are not set, Datalevin will use a
safer but slower default option instead of throwing exceptions. It is still
recommended to add these JVM options to get optimal performance. For now,
native image uses only the safer option.get-first-n to return first n key-values in a key range.list-range-first-n to return first n key-values in a key-value range
of a list DBI. #298init-db, fill-db, re-index and load by writing in
:nosync mode and syncing only at the end.c/plan-space-reduction-threshold, always use P(n, 3)
instead, as only initial 2 joins have accurate size estimation, so larger plan
space for later steps is not really beneficial. This change improves JOB.entity behavior in pod as in JVM.
#283:offset 0.empty on Datom so it can be walked.
#286:aot to avoid potential dependency conflict.:offset and :limit support,
#126,
#117:order-by support,
#116count-datoms function to return the number of datoms of a patterncardinality function to return the number of unique values of an
attributeq/*cache?* to
turn it off.sync function to force a synchronous flush to disk, useful
when non-default flags for writes are used.clear to bb pod.get-conn [Thx @aldebogdanov]like function failed to match in certain cases.clear function also clear the meta DBIinit-exec-size-threshold (default 1000),
above which, the same number of samples are collected instead. These
significantly improved subsequent join size estimation, as these initial steps
hugely impact the final plan.plan-space-reduction-threshold (default 990),
then greedy search is performed in later stages, as these later ones have less
impact on performance. This provides a good balance between planning time and
plan quality, while avoiding potential out of memory issue during planning.sample-processing-interval(default 3600 seconds).*fill-db-batch-size* to 1 million datoms.nil, #267like, in within complex logic expressions.not, and and or logic functions that involve only
one variable.:result to explain result map.like function similar to LIKE operator in SQL: (like input pattern) or (like input pattern opts). Match pattern accepts wildcards %
and _. opts map has key :escape that takes an escape character, default
is \!. Pattern is compiled into a finite state machine that does non-greedy
(lazy) matching, as oppose to the default in Clojure/Java regex. This function
is further optimized by rewritten into index scan range boundary for patterns
that have non-wildcard prefix. Similarly, not-like function is provided.in function that is similar to IN operator in SQL: (in input coll) which is optimized as index scan boundaries. Similarly, not-in.fill-db function to bulk-load a collection of trusted datoms,
*fill-db-batch-size* dynamic var to control the batch size (default 4
million datoms). The same var also control init-db batch size.read-csv function, a drop-in replacement for clojure.data.csv/read-csv.
This CSV parser is about 1.5X faster and is more robust in handling quoted
content.write-csv for completeness.min and max query predicates handle all comparable data.explain throws when zero result is determined prior to actual
planning. [Thx @aldebogdanov]explain result map.explain function to show query plan.DB Upgrade is required.
search-datoms function to lookup datoms without having to specify
an index.open-list-dbi:
put-list-itemsdel-list-itemsvisit-listget-listlist-countkey-range-list-countin-list?list-rangelist-range-countlist-range-filterlist-range-firstlist-range-somelist-range-keeplist-range-filter-countvisit-list-rangeoperate-list-val-rangekey-range function that returns a range of keys only.key-range-count function that returns the number of keys in a range.visit-key-range function that visit keys in a range for side effects.range-some function that is similar to some for a given range.range-keep function that is similar to keep for a given range.:eavt, :avet and :vaet are no longer accepted as index names,
use :eav, :ave and :vae instead. Otherwise, it's misleading, as we don't
store tx id.:mapasync to :nometasync, so
that the database is more crash resilient. In case of system crash, only the
last transaction might be lost, but the database will not be corrupted. #228datalevin/kv-info dbi to keep meta information about the databases, as
well as information about each dbi, as flags, key-size, etc. #184raw-pred? to
indicate whether the predicate takes a raw KV object (default), or a pair of
decoded values of k and v (more convenient).search-utils namespace are now compiled instead of
being interpreted to improve performance.:closed-schema? option to allow declared attributes only, default
is false. [Thx @andersmurphy]:validate-data? true not working for some data types. [Thx @andersmurphy]:db.fulltext/autoDomain boolean property to attribute schema,
default is false. When true, a search domain
specific for this attribute will be created, with a domain name same as
attribute name, e.g. "my/attribute". This enables the same fulltext function syntax as
Datomic, i.e. (fulltext $ :my/attribute ?search).:search-opts option to new-search-engine option argument,
specifying default options passed to search function.:db.fulltext/domains property to attribute schema, #176:search-domains to connection option map, a map from domain
names to search engine option maps.:domains option to fulltext built-in function option map<, >, <=, >= built-in functions handle any comparable data, not just numbers.:xform in pull expression not called for :cardinality/one ref
attributes, #224. [Thx @dvingo]:validate-data? does not recognize homogeneous tuple data type, #227.--nippy option to dump/load database in nippy binary
format, which handles some data anomalies, e.g. keywords with space in
them, non-printable data, etc., and produces smaller dump file, #216dtlv-re-index-<unix-timestamp> inside the
system temp directory when re-index, #213:index-position? search
engine option is true. #203:proximity-expansion search option (default 2) can be used to
adjust the search quality vs. time trade-off: the bigger the number, the
higher is the quality, but the longer is the search time.:proximity-max-dist search option (default 45) can be used to
control the maximal distance between terms that would still be considered as
belonging to the same span.create-stemming-token-filter function to create stemmers, which
uses Snowball stemming library that supports many languages. #209create-stop-words-token-filter function to take a customized stop
words predicate.re-index function that dump and load data with new
settings. Should only be called when no other threads or programs are
accessing the database. #179*datalevin-data-readers* dynamic var, use Clojure's
*data-readers* instead.:tx-data when unchanged. #207open-kv, don't grow :mapsize when it is the same as the current
size.:include-text? option to store original text. #178.:texts and :texts+offsets keys to :display option of search
function, to return original text in search results.dump and load of Datalog DB on Windows.:max-readers option to specify the maximal number of concurrent readers
allowed for the db file. Default is 126.max-dbs option to specify the maximal number of sub-databases (DBI)
allowed for the db file. Default is 128. It may induce slowness if too big a
number of DBIs are created, as a linear scan is used to look up a DBI.clear after db is resized.:last-active.datalog-index-cache-limit function to get/set the limit of Datalog
index cache. Helpful to disable cache when bulk transacting data. #195:idle-timeout option when creating the server, in ms, default is 24
hours. #122:db/tupleTypes and homogeneous tuples
:db/tupleType type. Unlike Datomic, the number of elements in a tuple are
not limited to 8, as long as they fit inside a 496 bytes buffer. In addition,
instead of using nil to indicate minimal value like in Datomic, one can use
:db.value/sysMin or :db.value/sysMax to indicate minimal or maximal
values, useful for range queries. #167*datalevin-data-readers* to support loading custom tag
literals. (thx @respatialized):db/fulltext value is added then removed in the
same transaction.search-utils/create-ngram-token-filter now works. #164:db/fulltext values transaction error. #177DB Upgrade is required.
remove-doc and 10X disk space reduction; when term positions and offsets are
indexed: 3X faster bulk load and 40 percent space reduction.:index-position? option to indicate whether to record term
positions inside documents, default false.:check-exist? argument to add-docindicate whether to check the
existence of the document in the index, default true. Set it to false when
importing data to improve ingestion speed.:db/fulltext values. #151doc-refs function.search-index-writer as well as related write and
commitfunctions for client/server, as it makes little sense to bulk load
documents across network.transact! inside with-transaction to ensure ACID and improved performanceget-range regression when results are used in sequence. #172create-database
instead of being created by opening a connection URIdb-name is unique on the server. (thx @dvingo)(random-uuid), since not every one is on Clojure 1.11 yet.DB Upgrade is required.
range-seq that has similar signature as
get-range, but returns a Seqable, which lazily reads data items into
memory in batches (controlled by :batch-size option). It should be used
inside with-open for proper cleanup. #108get-range and range-filter, now
automatically spill to disk when memory pressure is high. The results, though
mutable, still implement IPersistentVector, so there is no API level
change. The spill-to-disk behavior is controlled by spill-opts option map
when opening the db, allowing :spill-threshold and :spill-root options.:client-opts option map that is passed to the client when opening remote databases.with-transaction-kv does not drop prior data when DB is resizing.with-transaction does not drop prior data when DB is resizing.with-transaction-kv does not crash when DB is resizing.with-transaction-kv macro to expose explicit transactions for KV
database. This allows arbitrary code within a transaction to achieve
atomicity, e.g. to implement compare-and-swap semantics, etc, #110with-transaction macro, the same as the above for Datalog databaseabort-transact-kv function to rollback writes from within an explicit KV transaction.abort-transact function, same for Datalog transaction.visit functionfulltext-datoms function that return datoms found by full
text search query, #157nil instead, #158entity and touch function to babashka pod, these return regular
maps, as the Entity type does not exist in a babashka script. #148 (thx
@ngrunwald):timeout option to terminate on deadline for query/pull. #150 (thx
@cgrand).max-tx, #142tx-data->simulated-report to obtain a transaction report without actually persisting the changes. (thx @TheExGenesis):bigint and :bigdec data types, corresponding to
java.math.BigInteger and java.math.BigDecimal, respectively.:db.type/bigdec and :db.type/bigint, correspondingly, #138.update-schema to allow renaming attributes. #131clear-docs function to wipe out search index, as it might be faster
to rebuild search index than updating individual documents sometimes. #132datalevin.constants/*data-serializable-classes* dynamic var, which can be
used for binding if additional Java classes are to be serialized as part of
the default :data data type. #134:kv-opts to underlying KV store when create-connclear function on server. #133:search-engine option map key to :search-opts for consistency [Breaking]visit can return a special value :datalevin/terminate-visit to stop the visit.open-dbi signature to take an option map instead:validate-data? option for open-dbi, create-conn etc., #121:domain option to new-search-engine, so multiple search engines can
coexist in the same dir, each with its own domain, a string. #112update-schema to allow removal of attributes if they are
not associated with any datoms, #99:db/updated-at datom, #113datalevin.search-utils namespace with some utility functions to customize
search, #105 (thx @ngrunwald)visit KV function to core name spaceadd-doc for the same doc refinter-fn:auto-entity-time? Datalog DB creation option, so entities can optionally have
:db/created-at and :db/updated-at values added and maintained
automatically by the system during transaction, #86doc-count function returns the number of documents in the search indexdoc-refs function returns a seq of doc-ref in the search indexdatalevin.core/copy function can copy Datalog database directly.doc-indexed? functionadd-doc can update existing docopen-kv function allows LMDB flags, #100DB Upgrade is required.
visit function to do arbitrary things upon seeing a
value in a range:instant handles dates before 1970 correctly, #94. The storage
format of :instant type has been changed. For existing Datalog DB containing
:db.type/instant, dumping as a Datalog DB using the old version of dtlv, then
loading the data is required; For existing key-value DB containing :instant
type, specify :instant-pre-06 instead to read the data back in, then write
them out as :instant to upgrade to the current format.defpodfn so it works in non-JVM.load-edn for dtlv, useful for e.g. loading schema from a file, #101defpodfn macro to define a query function that can be used in babashka pod, #85max-aid after schema update (thx @den1k)disconnect message is received, clean up
resources afterwards, so a logically correct number of clients can be obtained
in the next API call on slow machines.create-conn should override the old, fix #65DTLV_LIB_EXTRACT_DIR environment variable to allow customization of native
libraries extraction location.org.clojars.huahaiy/datalevin-nativeorg.clojars.huahaiy/datalevin-native on clojars, for
depending on Datalevin while compiling GraalVM native image. User
no longer needs to manually compile Datalevin C libraries.db/-search or db/-datoms with
cheaper calls to improve remote store access speed.DB Upgrade is required.
dtlv exec takes input from stdin when no argument is given.clear function to clear Datalog dbdatom-eav, datom-e, etc.close-db convenience function to close a Datalog dbdatalevin.core, so users don't have to understand and require different namespaces in order to use all features.DB Upgrade is required.
open-lmdb and close-lmdb to open-kv and close-kv,
lmdb/transact to lmdb/transact-kv, so they are consistent, easier to
remember, and distinct from functions in datalevin.core.dtlv[(.getTime ?date) ?timestamp], [(.after ?date1 ?date2)], where the date variables are :db.type/instance. [#32]:db.type/instant value as java.util.Date, not as long [#30]lmdb/open-lmdbcore/get-conn schema updatecore/get-conn and core/with-conninit-max-eid for large values as well.:data type. [#23]core/empty-db:db/ident in implicit schemacore/create-conn, db/empty-db
etc., and put dir in front, since it is more likely to be specified than
schema in real use, so users don't have to put nil for schema.core/update-schemafalse value as :datacore/schema and core/update-schemacore/closed?db/entid allows 0 as eiddb/-first instead of (first (db/-datom ..)), db/-populated? instead of (not-empty (db/-datoms ..), as they do not realize the results hence faster.bits/read-buffer and bits/put-bufferlmdb/closed?, lmdb/clear-dbi, and lmdb/drop-dbicore/closeCan you improve this documentation? These fine people already did:
Huahai Yang & Jeroen van DijkEdit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |