[Search] Consider term proximity in relevance when :index-position? search
engine option is true. #203
[Search] :proximity-expansion search option (default 2) can be used to
adjust the search quality vs. time trade-off: the bigger the number, the
higher is the quality, but the longer is the search time.
[Search] :proximity-max-dist search option (default 45) can be used to
control the maximal distance between terms that would still be considered as
belonging to the same span.
[Search] create-stemming-token-filter function to create stemmers, which
uses Snowball stemming library that supports many languages. #209
[Search] create-stop-words-token-filter function to take a customized stop
words predicate.
[KV, Datalog, Search] re-index function that dump and load data with new
settings. Should only be called when no other threads or programs are
accessing the database. #179
[KV] :max-readers option to specify the maximal number of concurrent readers
allowed for the db file. Default is 126.
[KV] max-dbs option to specify the maximal number of sub-databases (DBI)
allowed for the db file. Default is 128. It may induce slowness if too big a
number of DBIs are created, as a linear scan is used to look up a DBI.
[KV] added tuple data type that accepts a vector of scalar values. This
supports range queries, i.e. having expected ordering by first element, then
second element, and so on. This is useful, for example, as path keys for
indexing content inside documents. When used in keys, the same 511 bytes
limitation applies.
[Datalog] added heterogeneous tuple :db/tupleTypes and homogeneous tuples
:db/tupleType type. Unlike Datomic, the number of elements in a tuple are
not limited to 8, as long as they fit inside a 496 bytes buffer. In addition,
instead of using nil to indicate minimal value like in Datomic, one can use
:db.value/sysMin or :db.value/sysMax to indicate minimal or maximal
values, useful for range queries. #167
[Main] dynamic var *datalevin-data-readers* to support loading custom tag
literals. (thx @respatialized)
[Search] significant indexing speed and space usage improvement: for default
setting, 5X faster bulk load speed; 2 orders of magnitude faster
remove-doc and 10X disk space reduction; when term positions and offsets are
indexed: 3X faster bulk load and 40 percent space reduction.
[Search] added caching for term and document index access, resulting in 5
percent query speed improvement on average, 35 percent improvement at median.
[Search] :index-position? option to indicate whether to record term
positions inside documents, default false.
[Search] :check-exist? argument to add-docindicate whether to check the
existence of the document in the index, default true. Set it to false when
importing data to improve ingestion speed.
[Search] search-index-writer as well as related write and
commitfunctions for client/server, as it makes little sense to bulk load
documents across network.
[Datalog] Ported all applicable Datascript improvements since 0.8.13 up to now
(1.4.0). Notably, added composite tuples feature, new pull implementation,
many bug fixes and performance improvements. #3, #57, #168
[Platform] embedded library support for Apple Silicon.
[KV] A new range function range-seq that has similar signature as
get-range, but returns a Seqable, which lazily reads data items into
memory in batches (controlled by :batch-size option). It should be used
inside with-open for proper cleanup. #108
[KV] The existent eager range functions, get-range and range-filter, now
automatically spill to disk when memory pressure is high. The results, though
mutable, still implement IPersistentVector, so there is no API level
change. The spill-to-disk behavior is controlled by spill-opts option map
when opening the db, allowing :spill-threshold and :spill-root options.
[KV] with-transaction-kv macro to expose explicit transactions for KV
database. This allows arbitrary code within a transaction to achieve
atomicity, e.g. to implement compare-and-swap semantics, etc, #110
[Datalog] with-transaction macro, the same as the above for Datalog database
[KV]abort-transact-kv function to rollback writes from within an explicit KV transaction.
[Datalog] abort-transact function, same for Datalog transaction.
[Pod] entity and touch function to babashka pod, these return regular
maps, as the Entity type does not exist in a babashka script. #148 (thx
@ngrunwald)
[Datalog] :timeout option to terminate on deadline for query/pull. #150 (thx
@cgrand).
[Datalog] Additional arity to update-schema to allow renaming attributes. #131
[Search] clear-docs function to wipe out search index, as it might be faster
to rebuild search index than updating individual documents sometimes. #132
datalevin.constants/*data-serializable-classes* dynamic var, which can be
used for binding if additional Java classes are to be serialized as part of
the default :data data type. #134
:auto-entity-time? Datalog DB creation option, so entities can optionally have
:db/created-at and :db/updated-at values added and maintained
automatically by the system during transaction, #86
[breaking]:instant handles dates before 1970 correctly, #94. The storage
format of :instant type has been changed. For existing Datalog DB containing
:db.type/instant, dumping as a Datalog DB using the old version of dtlv, then
loading the data is required; For existing key-value DB containing :instant
type, specify :instant-pre-06 instead to read the data back in, then write
them out as :instant to upgrade to the current format.
Remove client immediately when disconnect message is received, clean up
resources afterwards, so a logically correct number of clients can be obtained
in the next API call on slow machines.
Release artifact org.clojars.huahaiy/datalevin-native on clojars, for
depending on Datalevin while compiling GraalVM native image. User
no longer needs to manually compile Datalevin C libraries.
Consolidated all user facing functions to datalevin.core, so users don't have to understand and require different namespaces in order to use all features.
[Breaking] Removed AEV index, as it is not used in query. This reduces storage
and improves write speed.
[Breaking] Change VAE index to VEA, in preparation for new query engine. Now
all indices have the same order, just rotated, so merge join is more likely.
[Breaking] Change open-lmdb and close-lmdb to open-kv and close-kv,
lmdb/transact to lmdb/transact-kv, so they are consistent, easier to
remember, and distinct from functions in datalevin.core.
GraalVM native image specific LMDB wrapper. This wrapper allocates buffer
memory in C and uses our own C comparator instead of doing these work in Java,
so it is faster.
Allow Java interop calls in where clauses, e.g. [(.getTime ?date) ?timestamp], [(.after ?date1 ?date2)], where the date variables are :db.type/instance. [#32]
Changed default LMDB write behavior to use writable memory map and
asynchronous msync, significantly improved write speed for small transactions
(240X improvement for writing one datom at a time).
[Breaking] Change argument order of core/create-conn, db/empty-db
etc., and put dir in front, since it is more likely to be specified than
schema in real use, so users don't have to put nil for schema.
use array get wherenever we can in query, saw significant improvement in some queries.
use db/-first instead of (first (db/-datom ..)), db/-populated? instead of (not-empty (db/-datoms ..), as they do not realize the results hence faster.