Couplet is a small library that provides support for working with Unicode characters or ‘code points’ in Clojure.
The distinguishing feature of this library is the type that represents a sequence of code points: that type is efficiently seqable and reducible, and also supports parallel fold via fork/join.
This library targets Clojure on the JVM.
Clojure CLI tools:
ch.gluet/couplet {:mvn/version "0.1.2"}
Leiningen/Boot:
[ch.gluet/couplet "0.1.2"]
Require the core namespace as cp
, then use cp/codepoints
to obtain a
seqable/reducible of code points.
(require '[couplet.core :as cp])
(cp/codepoints "b🐝e🌻e")
;; => #couplet.core.CodePointSeq["b🐝e🌻e"]
(seq (cp/codepoints "b🐝e🌻e"))
;; => (98 128029 101 127803 101)
(->> "b🐝e🌻e" cp/codepoints (take-nth 2) cp/to-str)
;; => "bee"
There are other solutions for the same problem, though perhaps written with different goals in mind.
Check out ICU for an extensive, mature Java library for Unicode.
Run the benchmarks with
lein jmh '{:type :quick, :format :table}'
The following is a short summary of the findings.
Broadly speaking, processing strings using code points instead of char
s has no
negative impact on performance. On the contrary, the performance achieved here
compares favourably with that of Clojure’s own char
-based string processing.
cp/to-str
versus apply str
) to faster by a factor of 5 (lazy seq of
code points versus lazy seq of char
s).Strings support fast random access – code point seqs do not. For efficient
lookup of code points by index consider a vector-of :int
or Java array of
int
.
Copyright © 2017 David Bürgin
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close