Decisions made during meme's design and implementation, with rationale.
M-expressions were proposed by McCarthy (1960) as the intended surface syntax for Lisp. S-expressions were the internal representation — not meant for humans to write directly. The human-friendly syntax was never built; S-expressions stuck by accident.
meme picks up that thread for Clojure. One rule:
The head of a list is written outside the parens: f(x y) → (f x y).
Everything else is Clojure.
The reader pipeline is split into three explicit stages:
meme.alpha.scan.tokenizer) — character scanning → flat token vector.
Compound forms (reader conditionals, namespaced maps, syntax-quote)
emit marker tokens.meme.alpha.scan.grouper) — pass-through stage (all forms are
now parsed natively; retained for pipeline symmetry).meme.alpha.parse.reader) — recursive-descent parser → Clojure forms.
No read-string delegation — all values resolved natively.The three-stage split makes each stage independently testable and the
pipeline extensible. The grouper operates on already-tokenized input where
bracket matching is trivial — strings, chars, and comments are individual
tokens, so \) inside a string is just a :string token, not a closing
paren.
meme.alpha.pipeline composes the stages as ctx → ctx functions, threading a
context map with :source, :raw-tokens, :tokens, :forms. This makes
intermediate state visible to tooling via meme.alpha.core/run-pipeline.
All value resolution — numbers, strings, chars, regex, auto-resolve
keywords, tagged literals — is centralized in meme.alpha.parse.resolve.
The parser deals only with structural parsing; value interpretation is
delegated to resolve.
The goal is zero read-string delegation: meme parses everything
natively, including forms that were previously opaque (syntax-quote,
reader conditionals, namespaced maps). Platform asymmetries (JVM vs
ClojureScript) are isolated in resolve.
Clojure's reader rejects f(x y) as invalid syntax. meme fundamentally
changes what constitutes a valid token sequence. A custom tokenizer is
unavoidable.
meme is a thin syntactic transform. The output is Clojure forms — lists, vectors, maps, symbols, keywords. These are the same data structures Clojure's own reader produces. An intermediate AST would add complexity without benefit.
The parser is recursive-descent with many mutually recursive functions.
Threading position state through every function signature and return value
adds noise. A volatile! position counter is the lightest shared-state
mechanism in .cljc, works on both JVM and ClojureScript. The same
pattern is used for the scanner's line/col tracking and for the portable
string builder (make-sb/sb-append!/sb-str) which wraps StringBuilder
on JVM and a JS array on ClojureScript.
All special forms use call syntax: def(x 42), defn(f [x] body),
if(cond then else), try(body catch(Exception e handler)). Every
(...) with content must have a head. () is the empty list.
This dramatically simplifies both the reader and the printer:
maybe-call path.head(args...) format.do, catch, finally are regular symbols, not grammar keywords.The rule applies uniformly to reader dispatch forms too: #?(...) has
#? as its head. #(...) has # as its head. These are not exceptions
to the rule — they are instances of it.
A call is formed when a symbol, keyword, or vector immediately
precedes ( with no whitespace. foo(x) produces (foo x).
The rule is: the head of a list is written outside the parens, adjacent
to (. This applies to symbols (f(x)), keywords (:require([bar])),
and vectors ([x](body) for multi-arity clauses like ([x] body)).
Bare (...) with content but no preceding head is an error — the reader
rejects it with "Bare parentheses not allowed." () (empty parens) is
the empty list — it is unambiguous and needs no head.
Spacing between a head and its opening ( is significant — f(x) is
a call producing (f x), but f (x) is two forms: the symbol f
followed by bare (x) which is an error. This makes () unambiguous
in all positions: {:value ()} is a map with two entries (:value and
the empty list), not :value calling (). Similarly, [x ()] is a
two-element vector, not [(x)].
Previously, spacing was irrelevant (f (x) was also a call). This was
changed because it made () (the empty list) impossible to place after
any callable form in a container — the reader would always consume it
as a zero-arg call.
# dispatch forms follow the ruleAll #-prefixed forms are parsed natively by meme — no delegation to
read-string. The # dispatch character combines with the next
character(s) to form the head of an M-expression:
#?(...) — reader conditional. #? is the head.#?@(...) — splicing reader conditional. #?@ is the head.#(...) — anonymous fn. # is the head. Body is a single meme
expression; %, %1, %2, %& are collected and used to build
the fn parameter vector. #(inc(%)) → (fn [%1] (inc %1)).#{...} — set literal. # dispatches with {.#"..." — regex literal.#'x — var quote. Prefix operator.#_x — discard. Prefix operator.#inst "...", #uuid "..." — tagged literals.#:ns{...} — namespaced maps.Reader conditionals parse only the matching platform's branch; non-matching branches are skipped without parsing (they may contain platform-specific syntax invalid on the current platform).
Syntax-quote (`) is also parsed natively — its interior uses meme
syntax with ~ (unquote) and ~@ (unquote-splicing). Macro templates
are written in meme syntax: `if(~test do(~@body)).
' is a prefix operator that quotes the next meme form. There is no
S-expression escape hatch — ' does not switch parser modes:
'foo → (quote foo) — quoted symbol'[1 2 3] → (quote [1 2 3]) — quoted vector'f(x y) → (quote (f x y)) — quoted call'() → (quote ()) — quoted empty listLists are constructed by calls: list(1 2 3) → (list 1 2 3). For
quoting code, use quote: quote(+(1 2)) → (quote (+ 1 2)).
Same as Clojure. f(a, b, c) and f(a b c) are identical. Use
whichever style is clearer for the context.
Syntax-quote (`) is parsed natively — its interior uses meme
call syntax, not S-expressions. ~ (unquote) and ~@ (unquote-splicing)
work as prefix operators inside syntax-quote.
defmacro(unless [test & body] `if(~test nil do(~@body)))
The meme reader handles symbol resolution, gensym expansion, and
unquote splicing — the same transformations Clojure's reader performs,
but applied to meme-parsed forms. No read-string delegation.
-1 is a negative number. -(1 2) is a call to - with args 1 2.
The rule: if a sign character (- or +) is immediately followed by a
digit, it is part of a number token. If followed by (, whitespace, or
anything else, it is a symbol. This is a one-character lookahead in the
tokenizer. No ambiguity — but the decision affects user expectations, so
it has a scar tissue test.
::foo is resolved natively by the meme reader:
:resolve-keyword option (REPL, file runner): resolved at
read time to :actual.ns/foo, matching Clojure's semantics. The
caller provides the resolver function.read-meme-string):
deferred to eval time. The printer detects the deferred form and
emits ::foo for roundtripping.Numbers, strings, character literals, and regex patterns are tokenized as
raw text by meme's tokenizer, then resolved natively by
meme.alpha.parse.resolve. The goal is zero delegation to read-string —
meme parses numeric formats (hex, octal, ratios, BigDecimal), string
escape sequences, and character names itself, guaranteeing identical
behavior to the host platform without depending on its reader.
The codebase is split into three platform tiers:
.cljc, runs on JVM, Babashka,
and ClojureScript. These are pure functions with no eval or I/O dependency..cljc but require eval and read-line/
slurp, which are JVM/Babashka by default. ClojureScript callers can
inject these via options..clj, JVM only.
These use java.io, PushbackReader, System/exit.This separation is honest about what's portable. The .clj extension
prevents the ClojureScript compiler from attempting to compile JVM-only
code.
#() printer shorthand: zero-param and %&-only formsThe printer emits #(body) for (fn [] body) (zero params) and for
(fn [%1 %2] body) where all numbered %N params are used in the body.
Forms with surplus % params (declared but unused in body) fall through
to fn(...) syntax to avoid silently changing arity on roundtrip.
(fn [& %&] body) (rest-only, no numbered params) also falls through to
fn([& %&] body) because the & symbol in the param vector is not a
%-param, preventing the #() shorthand heuristic from matching. This is
intentional — the printer cannot distinguish fn([& %&] ...) from a
user-written named form.
maybe-call on all formsThe reader applies maybe-call uniformly — any form followed by ( is
a call. This means `expr(args), #:ns{...}(args), and
#?(...)(args) are valid call syntax. In practice these are rarely
meaningful, but the uniform behavior avoids special-casing.
The parser enforces max-depth of 512, checked in parse-form with a
volatile counter that increments on entry and decrements in finally.
This prevents stack overflow from deeply nested or malicious input.
512 is generous for any real program while staying well within JVM/CLJS
default stack sizes.
The tokenizer records (line, col) on each token. The tokenizer also
needs to map those positions back to character offsets in the source
string for whitespace attachment. If position tracking is inconsistent,
whitespace metadata is wrong: off-by-one truncation, stray characters,
or outright garbled output.
meme.alpha.scan.source/line-col->offset is the single definition that
ensures the mapping is consistent. The tokenizer uses it in
attach-whitespace. Because it's one function in one namespace, the
mapping can't diverge. The alternative — each stage carrying its own
offset logic — was the source of a previous bug where whitespace
attachment disagreed after a newline.
Note: the grouper previously used extract-source-range to capture raw
text for opaque regions, but all forms are now parsed natively and the
grouper is a pass-through. The shared contract remains important for
the tokenizer's whitespace attachment.
All error throw sites go through meme-error, which constructs ex-info
with a consistent structure: :line, :col (1-indexed), optional
:cause, and optional :source-context. This gives every error —
whether from the tokenizer, grouper, reader, or resolver — a uniform
shape that callers can rely on.
The :incomplete flag in ex-data is the REPL continuation protocol.
When a tokenizer or reader error is caused by premature EOF (unclosed
delimiters, lone backtick, unterminated string), the error is thrown
with {:incomplete true}. The REPL's input-state function catches
these errors and returns :incomplete to signal that more input may
complete the form. This lets the same error infrastructure that reports
parse failures also power multi-line input handling.
format-error produces IDE-quality display: line-number gutter, span
underlines (^ for single-column, ~~~ for multi-column via
:end-col), secondary source locations with labels, and hint text.
The secondary locations and hints are extension points for richer
diagnostics as the error system grows.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |