All notable changes to this project will be documented in this file. This change log follows the conventions of keepachangelog.com.
Support nippy
based files in a zip
based database - called zippy
This could be the new goto nippy database format, much more space efficient than
"classic" ndnippy
:
ndnippy
is usually as large as the uncompressed EDNsUpdate clarch to enhance decompression of zip entries
Fix reopening zip databases: For now need explicit :doc-parser!
Upgrade clarch dependency
.zip
files as databasesndfile-md5
work for zip files? Otherwise tweak serialized-db-filepath
:as-of
isn't set in the parse-db
'ed zip databasecompress
ns to new library: com.luposlip/clarch
or-q
now queries a seq of databases (by single ID), until a non-nil result is returnedNB: [batch size] is currently set to 128.
nd-db
might contain historical versionsnddbmeta
using same line (name of index reflecting lines)nddbmeta
doesn't exist, stop indexing after passed line number.nddbmeta
file with a hash and metadata reflectingMultiple documents are automatically written to db (and index) in batches of 128.
Minor refactoring
Minor refactoring
nd-db.compress
namespace containing input- and output-stream convenience fnsUtility function nd-db.convert/upgrade-nddbmeta!
converts your old pre-v0.9.0
nddbmeta files to the new format, and keeps the old under the same name with
_old
appended to the file name.
Internally the database is now no longer a future. Instead the :index is a delay. This means immediate initialization of the db value, and that the :index doesn't get realized until you start querying.
This also means that the lazy-docs
and lazy-ids
make even better sense
if you just want to traverse the database sequentially, because in that case
you're not using the realized index at all.
The external API for the library is unchanged. You initialize the database value in the same way, and you query it the same way too.
lazy-ids
failed in some cases when moving indexlazy-ids
now work when moving nddbmeta file around (i.e. with the db file)lazy-ids
has internal BufferedReader
. Should be passed from with-open
..nddbmeta
files.lazy-docs
now works with eager indexes:
(lazy-docs nd-db)
Or with lazy indexes:
(with-open [r (nd-db.index/reader nd-db)]
(->> nd-db
(lazy-docs r)
(drop 1000000)
(filter (comp pos? :amount))
(sort-by :priority)
(take 10)))
NB: For convenience this also works, without any penalty:
(with-open [r (nd-db.index/reader nd-db)]
(->> r
(lazy-docs nd-db)
...
(take 10)))
Still need to make the conversion function for pre-v0.9.0 .nddbmeta
files.
WIP! lazy-docs
might change signature when using the new index-reader
!
.nddbmeta
fileThe new format makes it much faster to initialize and sequentially read through the whole database. The change will make the most impact for humongous databases with millions of huge documents.
Old indexes will not be readable anymore. Good news is that there will be a new
nd-db.convert/upgrade-nddbmeta!
utility function, which can converts your old
file to the new format, and overwrite it.
The downside to the support for laziness is the size of the meta+index files, which in my tested scenarios have grown with 100%. This means a database containing ~300k huge documents (of 200-300Kb each in raw JSON/EDN form) has grown form ~5MB to ~10MB.
This is not a problem at all in real life, since when you need the realized in-memory index (for ad-hoc querying by ID), it still consumes the same amount of memory as before (in the above example ~3MB).
And compared to the database it describes it's nothing - the above mentioned
index is for a .ndnippy
database of 16.8GB.
lazy-docs
introduced with
v0.8.0
is now much more efficient. Again this is most noticable when you need
to read sequentially through parts of a huge database.Dependency buddy/buddy-core
not needed anymore. Instead using built-in similar
functionality from com.taoensso/nippy
.
nd-db.io/lazy-docs
Using projects couldn't compile nd-db with nippy version 3.2.0
Fix issue when creating index for ndjson/ndedn
Eliminate a reflective call when serializing the database.
0.6.0
- introducing .ndnippy
!
Now you can use .ndnippy
as database format. It's MUCH faster to load than
.ndjson
and .ndedn
, meaning better query times. Particularly when querying multiple documents at once.
Also a new util
namespace lets you convert from .ndjson
and .ndedn
to .ndnippy
.
.ndnippy
- like .ndedn
isn't really a standard. But it probably should be. I implemented the encoding for
.ndnippy
myself, it's somewhat naive, but really fast anyhow. If you have ideas on how to make it even
fast, let me know. Because version 0.6.0
introduces the .ndnippy
format, it may change several times in the
future, possibly making old .ndnippy
files incompatible with new versions. Now you're warned. Thankfully the
generation of new .ndnippy
files is quite fast.
NB: .ndnippy
isn't widely used (this is - as far as I know, the first and only use), and probably isn't a good distribution format, unless you can distribute the nd-db
library with it.
NB: For small documents (like the ones in the test samples), .ndnippy
files are actually bigger than their
json/edn alternatives. Even the Twitter sample .ndjson
file mentioned in the README
becomes bigger as
.ndnippy
. With the serialization mechanism used right now, the biggest benefits are when the individual documents
are huge (i.e. 10s of KBs). We've done experiments with methods that actually makes the resulting size the same as
the input, even for small documents. But there's a huge performance impact to using that, which is counter productive.
0.4.0
- simpler and smaller!
:doc-type
from db file extension (*.ndedn
-> :doc-type :edn
)
.ndedn
|.ndjson
or :doc-type :json
|:edn
:json
if extension is unknown and :doc-type
isn't setnd-db.core
, and the library from luposlip/ndjson-db
to com.luposlip/nd-db
!.ndedn
file format, where all lines are well formed EDN documents.clear-all-indices!!
-> clear-all-indexes!!
clear-all-indices!!
and clear-index!
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close