Lord Helmet: Found anything yet?
Mook 1: Nothing yet, sir!
Lord Helmet: How 'bout you?
Mook 2: Not a thing, sir!
Lord Helmet: How 'bout you guys?
Mook 3: We ain't found shit!
- Spaceballs
A Clojure(script) BitTorrent DHT plumbing library.
[com.jamesleonis/dht-cljc "0.1.0-SNAPSHOT"]
The BitTorrent DHT is an important component of the torrent ecosystem, and a growing number of additional technologies are beginning to use the network as well. The aim is to make this module small and embeddable in different Clojure applications. This allows the consuming app to define and control the network transport and the library to expand to new Clojure ecosystems.
(ns my-cool-project.core
(:require
[dht-cljc.core :as dht]
[dht-cljc.infohash :as infohash]
[dht-cljc.utils :as utils]))
; First you need a table. Here's an atom as an example.
(def table (atom (dht/generate-table (infohash/generate!))))
; Now we need to insert some nodes
(doseq [[remote-infohash ip port] (seq list-of-dht-nodes-from-network)]
(swap! table dht/insert (utils/now!) remote-infohash ip port))
; Find nodes by a given depth
(dht/get-by-depth table 2)
; => [ ...list of nodes... ]
(dht/get-nearest-peers table (utils/generate!))
; => [ ...sorted list of nodes, by distance... ]
; Refresh the timestamps of recently contacted peers
(swap! table dht/refresh [infohash1 (utils/now!)] [infohash2 (utils/now!)]
; Find bad nodes with get-by-overdue...
(let [bad-nodes (dht/get-by-overdue table (dht/fifteen-minutes-overdue!))]
; ...so we can prune them!
(swap! table dht/prune (map :infohash bad-nodes)))
dht-cljc
is a DHT swiss army knife to build, maintain, and query a BitTorrent DHT routing table. Operations are centered around updating a Routing Table with pure functions suitable for Atom or Agent storage by the consumer.
insert
automatically handles the splitting, rejecting-if-full, and adding nodes. It does not ping questionable nodes, as that is the responsibility of the consumer. See get-by-overdue
below.prune
is the opposite (obviously...) of insert
. It removes arbitrary nodes, by infohash, from the routing table, recombining buckets as necessary.refresh
takes a list of [infohash timestamp]
tuples and applies them to their respective nodes. Unlike the BEP we do not track questionable nodes explicitly, preferring to keep that in control of the consumer (see get-by-overdue
).While the list of nodes is available for all to see, sometimes it's helpful for common slices to be formalized. The primary functions combine distance
and depth
in ways that produce the BEP recommendation operations.
get-by-depth
gets all the nodes of a certain bucket, defined by depth
(see below), we simply look for all the items that match that depth. But as also noted above, depths below :splits
will return all nodes in the Client infohash bucket.get-nearest-peers
takes an infohash and finds the nearest bucket of nodes that are closest, then sorts them based on distance (ascending) from the infohash.get-by-overdue
returns a list of nodes that are from before the provided timestamp. The BEP describes a 15 minute window for refreshing clients, so the helper function fifteen-minutes-overdue!
returns a timestamp representing 15 minutes from invocation in milliseconds.I've included several other helpers for managing DHT clients.
dht-cljc.core/bootstrap-nodes
is a sequence of tuples [URL port]
that describe public DHT bootstrap nodes that can be queried initially.dht-cljc.infohash
namespace has several functions for managing infohashes.
depth
and distance
fnssha1
that operates on bytesgenerate!
creates a random infohash for your client.dht-cljc.utils
namespace has several general helpers.
now!
fn that gets the Unix epoch time.string->bytes
fnAs it stands, the BEP defines both the protocol and the transport medium, UDP. This project aims to implement the protocol while pointedly ignoring the UDP requirement. This keeps the library lightweight and composable, but sacrifices automatic pinging and refreshing which must be handled by the consumer. But never fear! All the tools you need to keep your nodes fresh are open and simple.
The Router Table contains both the nodes of the DHT, as well as some configuration of the table itself. A table can be initialized with the generate-table
function. The Table itself is made of these keys.
:router
- A vector of nodes in the entire table. Queries slice and dice this in various ways, but it can be extended by YOU!:client-infohash
- The infohash of the consuming app. This is used to determine distance and depth described below.:splits
- How many times the router split buckets. This is modified by the insert
and prune
functions to dynamically adjust the number of buckets.:max-bucket-count
- The maximum number of nodes in a given bucket. The default is set to 8, as recommended by the BEP, but this can be configured to any number. An example of such large buckets are DHT Bootstrapping nodes.Nodes are represented as a map describing the node. The keys are #{:infohash :depth :last-seen :ip :port}
. "Buckets" are represented by querying against the :depth
key.
In a nod towards IPv6, several keys are left to the consumer. This accommodates different implementations or use cases. This describes how dht-cljc
consumes the node.
:infohash
- A byte representation of the Infohash. This should not be modified, but can be transformed using the dht-cljc.infohash
namespace.:depth
- Integer representing the depth of the node. This should not be modified. See Depth below.:last-seen
- The representation of when a node was last seen. Should be acceptable to the <
compare function.:ip
- Not Used.:port
- Not Used.In the BEP the buckets are described as a range between powers of 2. A careful reading will reveal the organization is a descending bit from 160 to 0. This library refers to this as depth
, or a measure of how far this bit is from the highest order bit, as the same mathematical equivalent as a smaller integer distance in the original BEP. As such, any node that shares a depth are considered in the same bucket. This is stored in the node's :depth
key.
There is one exception to the depth bucket rule: The client infohash bucket. As defined in the BEP, buckets are only split when a new node is inserted into the Client infohash's bucket (or the bucket is not full, but ignore that for now). As such we need to know what is the Client Bucket and how many nodes are in it.
:splits
The Router Table keeps track of the number of bucket splits. Every time the client bucket is split, this number increments. This also means that any :depth
that is >= :splits
is considered in the Client Bucket. Convenient, eh?
But why?
:depth
that are >= :splits
number less than the maximum for a bucket, thus the client bucket doesn't need to split.The test suite is built in CLJC, and is run under both Clojure and Clojurescript.
The Clojurescript unit tests require NodeJS to run. Otherwise only Leiningen is required.
lein test
runs the tests under Clojure.lein cljs-test
runs the tests under Clojurescript and NodeJS.lein test-all
runs both Clojure and Clojurescript test suite.lein cljs-auto-test
automatically compiles and runs the Clojurescript tests on every change.Copyright © 2018 James Leonis
Distributed under the EPLv2 license. See LICENSE file.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close