[Datalog] Query optimizer to improve query performance, particularly for
complex queries. See details. #11
[Datalog] More space efficient storage format, leveraging LMDB's
dupsort feature, resulting in about 20% space reduction and faster counting of
data entries.
[Datalog] search-datoms function to lookup datoms without having to specify
an index.
[KV] Expose LMDB dupsort feature, i.e. B+ trees of B+ trees, #181, as the
following functions that work only for dbi opened with open-list-dbi:
put-list-items
del-list-items
visit-list
get-list
list-count
key-range-list-count
in-list?
list-range
list-range-count
list-range-filter
list-range-first
list-range-some
list-range-keep
list-range-filter-count
visit-list-range
operate-list-val-range
[KV] key-range function that returns a range of keys only.
[KV] key-range-count function that returns the number of keys in a range.
[KV] visit-key-range function that visit keys in a range for side effects.
[KV] range-some function that is similar to some for a given range.
[KV] range-keep function that is similar to keep for a given range.
[Datalog] :eavt, :avet and :vaet are no longer accepted as index names,
use :eav, :ave and :vae instead. Otherwise, it's misleading, as we don't
store tx id.
[KV] Change default write setting from :mapasync to :nometasync, so
that the database is more crash resilient. In case of system crash, only the
last transaction might be lost, but the database will not be corrupted. #228
[KV] datalevin/kv-info dbi to keep meta information about the databases, as
well as information about each dbi, as flags, key-size, etc. #184
[KV] Functions that take a predicate have a new argument raw-pred? to
indicate whether the predicate takes a raw KV object (default), or a pair of
decoded values of k and v (more convenient).
[Datalog] Add :db.fulltext/autoDomain boolean property to attribute schema,
default is false. When true, a search domain
specific for this attribute will be created, with a domain name same as
attribute name, e.g. "my/attribute". This enables the same fulltext function syntax as
Datomic, i.e. (fulltext $ :my/attribute ?search).
[Search] Add :search-opts option to new-search-engine option argument,
specifying default options passed to search function.
[main] Added an --nippy option to dump/load database in nippy binary
format, which handles some data anomalies, e.g. keywords with space in
them, non-printable data, etc., and produces smaller dump file, #216
[Search] Consider term proximity in relevance when :index-position? search
engine option is true. #203
[Search] :proximity-expansion search option (default 2) can be used to
adjust the search quality vs. time trade-off: the bigger the number, the
higher is the quality, but the longer is the search time.
[Search] :proximity-max-dist search option (default 45) can be used to
control the maximal distance between terms that would still be considered as
belonging to the same span.
[Search] create-stemming-token-filter function to create stemmers, which
uses Snowball stemming library that supports many languages. #209
[Search] create-stop-words-token-filter function to take a customized stop
words predicate.
[KV, Datalog, Search] re-index function that dump and load data with new
settings. Should only be called when no other threads or programs are
accessing the database. #179
[KV] :max-readers option to specify the maximal number of concurrent readers
allowed for the db file. Default is 126.
[KV] max-dbs option to specify the maximal number of sub-databases (DBI)
allowed for the db file. Default is 128. It may induce slowness if too big a
number of DBIs are created, as a linear scan is used to look up a DBI.
[KV] added tuple data type that accepts a vector of scalar values. This
supports range queries, i.e. having expected ordering by first element, then
second element, and so on. This is useful, for example, as path keys for
indexing content inside documents. When used in keys, the same 511 bytes
limitation applies.
[Datalog] added heterogeneous tuple :db/tupleTypes and homogeneous tuples
:db/tupleType type. Unlike Datomic, the number of elements in a tuple are
not limited to 8, as long as they fit inside a 496 bytes buffer. In addition,
instead of using nil to indicate minimal value like in Datomic, one can use
:db.value/sysMin or :db.value/sysMax to indicate minimal or maximal
values, useful for range queries. #167
[Main] dynamic var *datalevin-data-readers* to support loading custom tag
literals. (thx @respatialized)
[Search] significant indexing speed and space usage improvement: for default
setting, 5X faster bulk load speed; 2 orders of magnitude faster
remove-doc and 10X disk space reduction; when term positions and offsets are
indexed: 3X faster bulk load and 40 percent space reduction.
[Search] added caching for term and document index access, resulting in 5
percent query speed improvement on average, 35 percent improvement at median.
[Search] :index-position? option to indicate whether to record term
positions inside documents, default false.
[Search] :check-exist? argument to add-docindicate whether to check the
existence of the document in the index, default true. Set it to false when
importing data to improve ingestion speed.
[Search] search-index-writer as well as related write and
commitfunctions for client/server, as it makes little sense to bulk load
documents across network.
[Datalog] Ported all applicable Datascript improvements since 0.8.13 up to now
(1.4.0). Notably, added composite tuples feature, new pull implementation,
many bug fixes and performance improvements. #3, #57, #168
[Platform] embedded library support for Apple Silicon.
[KV] A new range function range-seq that has similar signature as
get-range, but returns a Seqable, which lazily reads data items into
memory in batches (controlled by :batch-size option). It should be used
inside with-open for proper cleanup. #108
[KV] The existent eager range functions, get-range and range-filter, now
automatically spill to disk when memory pressure is high. The results, though
mutable, still implement IPersistentVector, so there is no API level
change. The spill-to-disk behavior is controlled by spill-opts option map
when opening the db, allowing :spill-threshold and :spill-root options.
[KV] with-transaction-kv macro to expose explicit transactions for KV
database. This allows arbitrary code within a transaction to achieve
atomicity, e.g. to implement compare-and-swap semantics, etc, #110
[Datalog] with-transaction macro, the same as the above for Datalog database
[KV]abort-transact-kv function to rollback writes from within an explicit KV transaction.
[Datalog] abort-transact function, same for Datalog transaction.
[Pod] entity and touch function to babashka pod, these return regular
maps, as the Entity type does not exist in a babashka script. #148 (thx
@ngrunwald)
[Datalog] :timeout option to terminate on deadline for query/pull. #150 (thx
@cgrand).
[Datalog] Additional arity to update-schema to allow renaming attributes. #131
[Search] clear-docs function to wipe out search index, as it might be faster
to rebuild search index than updating individual documents sometimes. #132
datalevin.constants/*data-serializable-classes* dynamic var, which can be
used for binding if additional Java classes are to be serialized as part of
the default :data data type. #134
:auto-entity-time? Datalog DB creation option, so entities can optionally have
:db/created-at and :db/updated-at values added and maintained
automatically by the system during transaction, #86
[breaking]:instant handles dates before 1970 correctly, #94. The storage
format of :instant type has been changed. For existing Datalog DB containing
:db.type/instant, dumping as a Datalog DB using the old version of dtlv, then
loading the data is required; For existing key-value DB containing :instant
type, specify :instant-pre-06 instead to read the data back in, then write
them out as :instant to upgrade to the current format.
Remove client immediately when disconnect message is received, clean up
resources afterwards, so a logically correct number of clients can be obtained
in the next API call on slow machines.
Release artifact org.clojars.huahaiy/datalevin-native on clojars, for
depending on Datalevin while compiling GraalVM native image. User
no longer needs to manually compile Datalevin C libraries.
Consolidated all user facing functions to datalevin.core, so users don't have to understand and require different namespaces in order to use all features.
[Breaking] Removed AEV index, as it is not used in query. This reduces storage
and improves write speed.
[Breaking] Change VAE index to VEA, in preparation for new query engine. Now
all indices have the same order, just rotated, so merge join is more likely.
[Breaking] Change open-lmdb and close-lmdb to open-kv and close-kv,
lmdb/transact to lmdb/transact-kv, so they are consistent, easier to
remember, and distinct from functions in datalevin.core.
GraalVM native image specific LMDB wrapper. This wrapper allocates buffer
memory in C and uses our own C comparator instead of doing these work in Java,
so it is faster.
Allow Java interop calls in where clauses, e.g. [(.getTime ?date) ?timestamp], [(.after ?date1 ?date2)], where the date variables are :db.type/instance. [#32]
Changed default LMDB write behavior to use writable memory map and
asynchronous msync, significantly improved write speed for small transactions
(240X improvement for writing one datom at a time).
[Breaking] Change argument order of core/create-conn, db/empty-db
etc., and put dir in front, since it is more likely to be specified than
schema in real use, so users don't have to put nil for schema.
use array get wherenever we can in query, saw significant improvement in some queries.
use db/-first instead of (first (db/-datom ..)), db/-populated? instead of (not-empty (db/-datoms ..), as they do not realize the results hence faster.