Dendrite is a library for querying large datasets on a single host at near-interactive speeds.
It attempts to be:
The current implementation is in Java but only exposes a Clojure API. In the future, I would like to expose a clean Java interface and build a C implementation for non-JVM code.
This code has been in used in production for over a year. It has been successfully used both as a building block in large ETL systems and for ad-hoc data-science studies. However, prior to the 1.0 release, no effort will be made at preserving backwards compatibility of APIs or binary compatibility of files.
Dendrite implements the record shredding and assembly ideas from Google's Dremel paper [1]. Querying for only small parts of the stored records can be up to several orders of magnitude faster than fully deserializing each record and pulling out the desired information. Furthermore, this library also borrows many ideas from the Parquet project, an implementation of the Dremel file format for Hadoop. Unlike Parquet, Dendrite is not tied to any particular ecosystem and is designed to be a small library with no external dependencies.
Status update (March 23, 2018): For personal reasons, I haven't been able to work on this project in the past two years. However, I have been accumulating ideas for the next iteration.
Work-in-progress documentation and benchmarks are available at dendrite.tech.
Copyright © 2013-2017 John Whitbeck
Distributed under the Eclipse Public License, the same as Clojure.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close