Fast clojure json parser
The parser's focus is on speed and not validating json.
All data returned are clojure persistent data structures that can be used with assoc, merge, map, reduce etc.
(require '[pjson.core :refer [read-str write-str get-charset]])
(def m (read-str "{\"a\": 1}"))
;;converts a String to a JSON object {} converts to Maps and [] converts to Vectors
(type m)
;;pjson.JSONAssociative
[(instance? java.util.Map m) (instance? clojure.lang.IPersistentMap m) (map? m)]
;;[true true true]
(def v (read-str "[1, 2, 3]"))
;;converts a String to a JSON object {} converts to Maps and [] converts to Vectors
(type v)
;;pjson.JSONAssociative$JSONVector
[(instance? java.util.Collection v) (instance? clojure.lang.IPersistentVector v) (vector? v)]
;;[true true true]
(prn (write-str m))
;;"{\"a\":1}"
(prn (write-str {:a 2, :b 2}))
"{\"b\":2,\"a\":2}"
;; read-str with different charsets
(read-str "{\"a\": 1}" (get-charset "UTF-8"))
;;{"a" 1}
;; read-str with different default charset at a different offset
(def s "myprop={\"a\": 1}")
(read-str s (get-charset "UTF-8") 8 (count s))
;;{"a" 1}
This library is about parsing the data as fast as possible, and validation is not done extensively.
Exceptions are thrown where in a performant way data is deemed invalid, e.g
(require '[pjson.core :as pjson])
(pjson/read-str "{\"abc\": bla}")
;; {}
(pjson/read-str "{\"abc\": bla, \"edf\":122}")
;;{"abc" "edf"}
Data validation should be a separate step after parsing as required and normally can be done much faster on an application specific way than can be done generically by the parser.
(require '[pjson.core :as pjson])
(def user (pjson/read-str "{\"abc\": bla}"))
(defn valid-user-token? [msg] (get msg "user"))
(valid-user-token? user)
;; nil
(valid-user-token? (pjson/read-str "{\"user\": \"abc\"}"))
;; "abc"
;; is much more performant and usable than
(def user (pjson/read-str "{\"abc\": bla}"))
;;{}
So my point of view is invalid-document -> exception == invalid-document -> validation
.
As for version 0.3.6
escaped characters are supported and correctly read, if a String with an escape
character is detected the parser will drop into a slower parsing implementation that will read all escapes correctly.
The drop into slower parser is only done for those Strings that contain escaped characters thus only paying the performance penalty where required.
The full set of character sets as per http://www.w3schools.com/charsets/ is supported
{"basic-latin" (unicode-seq 0 127)
"basic-latin-supplement" (unicode-seq 128 255)
"latin-extended-a" (unicode-seq 256 383)
"latin-extended-b" (unicode-seq 384 591)
"modified-letters" (unicode-seq 688 767)
"diacritical-marks" (unicode-seq 768 879)
"greek-and-coptic" (unicode-seq 880 1023)
"cycrillic-basic" (unicode-seq 1024 1279)
"cycrillic-supplement" (unicode-seq 1280 1327)}
Although the code compiles with JRE 1.5 compatibility your encouraged to run with at least 1.7 b53 or upwards. This is because in 1.7 b53 major performance improvements have been done on String creations, which is central to this library.
see:
https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs http://java-performance.info/charset-encoding-decoding-java-78/
Always use the DEFAULT_CHARSET (from this library) which use the "ISO-8859-1" encoding.
This library concentrates on maximum performance and thus tries to manipulate data is little as possible. If you are using this library in a backend to send data to Java Script, you need to escape your strings according to the document here: http://timelessrepo.com/json-isnt-a-javascript-subset.
The fastest Charset is used by default which is the "ISO-8859-1" charset (optimized in java 1.7).
The pjson.core
namespace defines the following single arity functions to allow easy swap between other json libraries.
byte[] msg_bts = "{\"id\" \"0.0.1.71.105.212.33.205.21.86.123.210.94.161.24.1.3\", \"uuid\" 5090315992110618240, \"id2\" 11, \"ts\" 1406229815757, \"type\" 1, \"anothertype\" \"null\", \"events\" {\"info\" {\"id\" 770128}, \"id2\" 3, \"events\" {\"id\" 930415460}, \"timestamp\" 1406229815757}, \"size\" 5, \"slots\" 1, \"geo\" {\"city\" 9065607, \"country\" 59, \"state\" 162, \"zip\" 121397}}".getBytes();
//if the message starts with an object we can cast to a Map, arrays are of type List.
Map<Object, Object> obj = (Map<Object, Object>) PJSON.defaultLazyParse(StringUtil.DEFAULT_CHAR_SET, msg_bts);
System.out.println(obj);
To run your own benchmarks use lein perforate
.
The run takes some time, you might want to comment out the slower ones to speed up the process.
Using criterium and JVM 1.7 b60 and Charset "ISO-8859-1"
See [https://github.com/gerritjvv/pjson/blob/master/benchmarks.md]
The aim is to see how fast I can pass a message to a library,
do the bare minimum parsing and return back a result.
This benchmark is not fair to the non lazy libraries but does show a practical example where in practice you almost never access 100% of a message.
Summary of the benchmark results are below (in order of faster to slowest).
Library | Mean in ms (lower is better) |
---|---|
pjson | 348 |
boon | 806 |
cheshire | 2700 |
clj-json | 2900 |
data.json | 7000 |
Goal: JSON Parse Benchmark
-----
Case: :pjson
Evaluation count : 180 in 60 samples of 3 calls.
Execution time mean : 348.629604 ms
Execution time std-deviation : 6.353164 ms
Execution time lower quantile : 337.189665 ms ( 2.5%)
Execution time upper quantile : 355.916665 ms (97.5%)
Overhead used : 1.788077 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 7.7888 % Variance is slightly inflated by outliers
Case: :boon
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 806.579640 ms
Execution time std-deviation : 7.381952 ms
Execution time lower quantile : 794.885499 ms ( 2.5%)
Execution time upper quantile : 825.284011 ms (97.5%)
Overhead used : 1.788077 ns
Found 3 outliers in 60 samples (5.0000 %)
low-severe 1 (1.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Case: :data.json
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 7.652088 sec
Execution time std-deviation : 83.905373 ms
Execution time lower quantile : 7.549722 sec ( 2.5%)
Execution time upper quantile : 7.837267 sec (97.5%)
Overhead used : 1.788077 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Case: :clj-json
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 2.923051 sec
Execution time std-deviation : 30.183434 ms
Execution time lower quantile : 2.890043 sec ( 2.5%)
Execution time upper quantile : 3.001162 sec (97.5%)
Overhead used : 1.788077 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Case: :cheshire
Evaluation count : 60 in 60 samples of 1 calls.
Execution time mean : 2.729360 sec
Execution time std-deviation : 30.997395 ms
Execution time lower quantile : 2.692323 sec ( 2.5%)
Execution time upper quantile : 2.804218 sec (97.5%)
Overhead used : 1.788077 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
This library could not have been written without the wonderful work done on the json boon library already.
It has shown me how to use Unsafe to create String(s) really fast, and where appropriate I've shamelessly copied.
Its impossible to create the perfect bug free library. So feel free if you have an improvement, bug fix or idea
to open a Git Issue and or send me a Pull Request.
Github sometimes is not great for notifying when a new Issue has been made, so if I do not respond please ping me
on my mail below :)
Copyright © 2014 gerritjvv@gmail.com
Distributed under the Eclipse Public License either version 1.0
Can you improve this documentation? These fine people already did:
Gerrit Jansen van Vuuren, Gerrit & Santiago CastroEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close