Decisions made during meme's design and implementation, with rationale.
M-expressions were proposed by McCarthy (1960) as the intended surface syntax for Lisp. S-expressions were the internal representation — not meant for humans to write directly. The human-friendly syntax was never built; S-expressions stuck by accident.
meme picks up that thread for Clojure. One rule:
The head of a list is written outside the parens: f(x y) → (f x y).
Everything else is Clojure.
The reader has two core stages and one optional stage:
meme.tools.parser with meme-lang.grammar) — unified scanlet-parselet Pratt parser. Reads directly from a source string. Scanning (character dispatch), trivia classification, and structural parsing are all defined in the grammar spec. Produces a lossless CST. Grammar is a map of characters to scanlet/parselet functions.meme-lang.cst-reader) — lowers CST to Clojure forms. No read-string delegation — all values resolved natively via meme-lang.resolve.meme-lang.expander) — syntax-quote AST nodes → plain Clojure forms. Only needed before eval, not for tooling.The core stages/run calls stages 1–2, returning AST nodes for
tooling. run-string chains all three stages before eval.
The split makes each stage independently testable and the composition extensible.
The grammar's lexical scanlets handle all character-level concerns (strings, chars, comments
are individual tokens, so \) inside a string is just a :string token,
not a closing paren). The Pratt parser engine handles all structural concerns.
meme-lang.stages composes the stages as ctx → ctx functions, threading a
context map with :source, :cst, :forms.
All value resolution — numbers, strings, chars, regex, auto-resolve
keywords, tagged literals — is centralized in meme-lang.resolve.
The parser deals only with structural parsing; value interpretation is
delegated to resolve.
The goal is zero read-string delegation: meme parses everything
natively, including forms that were previously opaque (syntax-quote,
reader conditionals, namespaced maps). Platform asymmetries (JVM vs
ClojureScript) are isolated in resolve.
Clojure's reader rejects f(x y) as invalid syntax. meme fundamentally
changes what constitutes a valid token sequence. A custom tokenizer is
unavoidable.
meme is a thin syntactic transform. The output is Clojure forms — lists, vectors, maps, symbols, keywords. These are the same data structures Clojure's own reader produces. An intermediate AST would add complexity without benefit.
The parser is recursive-descent with many mutually recursive functions.
Threading position state through every function signature and return value
adds noise. A volatile! position counter is the lightest shared-state
mechanism in .cljc, works on both JVM and ClojureScript. The same
pattern is used for the scanner's line/col tracking and for the portable
string builder (make-sb/sb-append!/sb-str) which wraps StringBuilder
on JVM and a JS array on ClojureScript.
All special forms use call syntax: def(x 42), defn(f [x] body),
if(cond then else), try(body catch(Exception e handler)). Every
(...) with content must have a head. () is the empty list.
This dramatically simplifies both the reader and the printer:
maybe-call path (only nil, true, false are
special-cased as literals before maybe-call).head(args...) format.do, catch, finally are regular symbols, not grammar keywords.The rule applies uniformly to reader dispatch forms too: #?(...) has
#? as its head. #(...) has # as its head. These are not exceptions
to the rule — they are instances of it.
A call is formed when a symbol, keyword, or vector immediately
precedes ( with no whitespace. foo(x) produces (foo x).
The rule is: the head of a list is written outside the parens, adjacent
to (. This applies to symbols (f(x)), keywords (:require([bar])),
and vectors ([x](body) for multi-arity clauses like ([x] body)).
Bare (...) with content but no preceding head is an error — the reader
rejects it with "Bare parentheses not allowed." () (empty parens) is
the empty list — it is unambiguous and needs no head.
Spacing between a head and its opening ( is significant — f(x) is
a call producing (f x), but f (x) is two forms: the symbol f
followed by bare (x) which is an error. This makes () unambiguous
in all positions: {:value ()} is a map with two entries (:value and
the empty list), not :value calling (). Similarly, [x ()] is a
two-element vector, not [(x)].
Previously, spacing was irrelevant (f (x) was also a call). This was
changed because it made () (the empty list) impossible to place after
any callable form in a container — the reader would always consume it
as a zero-arg call.
# dispatch forms follow the ruleAll #-prefixed forms are parsed natively by meme — no delegation to
read-string. The # dispatch character combines with the next
character(s) to form the head of an M-expression:
#?(...) — reader conditional. #? is the head.#?@(...) — splicing reader conditional. #?@ is the head.#(...) — anonymous fn. # is the head. Body is a single meme
expression; %, %1, %2, %& are collected and used to build
the fn parameter vector. #(inc(%)) → (fn [%1] (inc %1)).#{...} — set literal. # dispatches with {.#"..." — regex literal.#'x — var quote. Prefix operator.#_x — discard. Prefix operator.#inst "...", #uuid "..." — tagged literals.#:ns{...} — namespaced maps.Reader conditionals parse all branches but only return the matching platform's value — non-matching branches are fully parsed then discarded.
Syntax-quote (`) is also parsed natively — its interior uses meme
syntax with ~ (unquote) and ~@ (unquote-splicing). Macro templates
are written in meme syntax: `if(~test do(~@body)).
' is a prefix operator that quotes the next meme form. There is no
S-expression escape hatch — ' does not switch parser modes:
'foo → (quote foo) — quoted symbol'[1 2 3] → (quote [1 2 3]) — quoted vector'f(x y) → (quote (f x y)) — quoted call'() → (quote ()) — quoted empty listLists are constructed by calls: list(1 2 3) → (list 1 2 3). For
quoting code, use quote: quote(+(1 2)) → (quote (+ 1 2)).
Same as Clojure. f(a, b, c) and f(a b c) are identical. Use
whichever style is clearer for the context.
Syntax-quote (`) is parsed natively — its interior uses meme
call syntax, not S-expressions. ~ (unquote) and ~@ (unquote-splicing)
work as prefix operators inside syntax-quote.
defmacro(unless [test & body] `if(~test nil do(~@body)))
The meme reader handles symbol resolution, gensym expansion, and
unquote splicing — the same transformations Clojure's reader performs,
but applied to meme-parsed forms. No read-string delegation.
-1 is a negative number. -(1 2) is a call to - with args 1 2.
The rule: if a sign character (- or +) is immediately followed by a
digit, it is part of a number token. If followed by (, whitespace, or
anything else, it is a symbol. This is a one-character lookahead in the
tokenizer. No ambiguity — but the decision affects user expectations, so
it has a scar tissue test.
::foo is resolved natively by the meme reader:
:resolve-keyword option (REPL, file runner): resolved at
read time to :actual.ns/foo, matching Clojure's semantics. The
caller provides the resolver function.read-meme-string):
deferred to eval time. The printer detects the deferred form and
emits ::foo for roundtripping.Numbers, strings, character literals, and regex patterns are tokenized as
raw text by meme's scanner, then resolved natively by
meme-lang.resolve. The goal is zero delegation to read-string —
meme parses numeric formats (hex, octal, ratios, BigDecimal), string
escape sequences, and character names itself, guaranteeing identical
behavior to the host platform without depending on its reader.
The codebase is split into three platform tiers:
meme.tools.{parser, lexer, render}, meme-lang.{api, grammar, lexlets, parselets, stages, cst-reader, forms, errors, resolve, expander, printer, values, formatter.flat, formatter.canon}) — portable .cljc, runs on JVM, Babashka, and ClojureScript. Pure functions with no eval or I/O dependency.meme.tools.{repl, run}, meme-lang.{repl, run}, meme.{registry, cli}) — .clj, JVM/Babashka only. Require eval and read-line/slurp..clj, JVM only.
These use java.io, PushbackReader, System/exit.This separation is honest about what's portable. The .clj extension
prevents the ClojureScript compiler from attempting to compile JVM-only
code.
#() printer shorthandThe printer emits #(body) when the form has :meme-lang/sugar true metadata —
set by the reader when it parses #(...) source syntax. A user-written
fn([%1] body) lacks this metadata and prints back as fn(...).
This is an instance of the syntactic transparency principle: the reader
tags the notation, the printer reconstructs it. No body inspection or
surplus-param heuristic is needed — the reader's build-anon-fn-params
already builds the correct parameter vector at read time from the %
params found in the body.
maybe-call on all formsThe reader applies maybe-call uniformly — any form followed by ( is
a call. This means `expr(args), #:ns{...}(args), and
#?(...)(args) are valid call syntax. In practice these are rarely
meaningful, but the uniform behavior avoids special-casing.
The parser enforces max-depth of 512, checked in parse-form with a
volatile counter that increments on entry and decrements in finally.
This prevents stack overflow from deeply nested or malicious input.
512 is generous for any real program while staying well within JVM/CLJS
default stack sizes.
The parser engine records (line, col) on each token. Position tracking uses the scanner line model (only \n is a line break, \r occupies a column). This is handled internally within meme.tools.parser. The display line model (str/split-lines in meme-lang.errors) may diverge for CRLF sources — format-error bridges the gap by clamping carets.
All error throw sites go through meme-error, which constructs ex-info
with a consistent structure: :line, :col (1-indexed), optional
:cause, and optional :source-context. This gives every error —
whether from the tokenizer, reader, or resolver — a uniform
shape that callers can rely on.
The :incomplete flag in ex-data is the REPL continuation protocol.
When a tokenizer or reader error is caused by premature EOF (unclosed
delimiters, lone backtick, unterminated string), the error is thrown
with {:incomplete true}. The REPL's input-state function catches
these errors and returns :incomplete to signal that more input may
complete the form. This lets the same error infrastructure that reports
parse failures also power multi-line input handling.
format-error produces IDE-quality display: line-number gutter, span
underlines (^ for single-column, ~~~ for multi-column via
:end-col), secondary source locations with labels, and hint text.
The secondary locations and hints are extension points for richer
diagnostics as the error system grows.
meme is a syntactic lens, not a compiler. The read→print path
must be transparent: if the user writes 'x (sugar), it prints back as
'x; if they write quote(x) (explicit call), it prints back as
quote(x). The same applies to @/clojure.core/deref and
#'/var.
Principle: Every piece of user syntax that has more than one representation must be preserved through the stages. When the reader collapses two notations into the same Clojure form, it must tag the form with metadata recording which notation was used. The printer checks that metadata to reconstruct the original syntax.
Implementation: The reader attaches :meme-lang/sugar true metadata to
forms produced by sugar syntax (', @, #'). The printer checks
this: sugar-tagged forms emit sugar; untagged forms emit the explicit
call. The :meme-lang/sugar key is stripped from display metadata (alongside
:line, :column, :file, :meme-lang/leading-trivia) so it never appears in output.
Why this matters: Without this, the stages silently normalize
user code. var(x) becomes #'x. quote(list) becomes 'list.
A syntactic lens that rewrites your code is not a lens — it's a
formatter with opinions. Every new syntax feature should be checked:
can two notations produce the same form? If yes, metadata must
distinguish them.
Known remaining losses:
#_ discarded forms: gone by design.Previously fixed (these were losses in earlier versions, now preserved via metadata):
:meme-lang/namespace-prefix metadata on the map.:meme-lang/meta-chain.:meme-lang/insertion-order (insertion order).#() vs fn(): preserved via :meme-lang/sugar (see above section).meme uses a single implementation of the meme↔Clojure translation, registered as :meme in the lang registry.
The pipeline combines a unified scanlet-parselet Pratt parser (meme.tools.parser with meme-lang.grammar) and a Wadler-Lindig document printer (meme-lang.printer). It preserves all metadata, sugar flags (:meme-lang/sugar), whitespace annotations, and comment positions through roundtrips.
Use for: formatting, tooling integration, roundtrip-sensitive workflows.
.5, .5M, .5e1)Clojure reads .5 as the float 0.5. Meme tokenizes .5 as a symbol because . is the Java interop prefix (.method). The tokenizer enters number mode only when the current character is a digit. This is an intentional divergence — in meme, .foo is always an interop method call, and leading-dot floats must be written with an explicit zero: 0.5.
Clojure accepts "\uD800\uDC00" (a valid UTF-16 surrogate pair encoding U+10000) and produces the supplementary character. Meme rejects each \uXXXX escape individually — if the code point falls in the surrogate range (U+D800..U+DFFF), it errors regardless of whether the next escape forms a valid pair. This is a defensive choice that prevents isolated surrogates from entering the output. Users can include supplementary characters directly as literal UTF-8 in source text.
Clojure's array-map preserves insertion order for up to 8 entries. Beyond that, it promotes to PersistentHashMap which does not preserve order. Since the meme parser builds maps via (apply array-map forms), maps with 9+ keys may have their key order shuffled in output. Sets preserve order via :meme-lang/insertion-order metadata, but maps do not have an equivalent mechanism. This is a Clojure platform limitation, not a meme design choice.
Comments in meme source (;; comment) are attached as :meme-lang/leading-trivia metadata to the following form. However, primitive types (keywords, numbers, strings, booleans) do not implement IMeta in Clojure and cannot carry metadata. When a comment appears before a keyword map key (e.g., {; comment\n :a 1}), the comment is lost because :a cannot store metadata. This is a fundamental Clojure platform limitation. Comments before symbols, vectors, maps, and sets are preserved correctly.
The meme formatter uses a "comment attaches to next form" model inherited from the tokenizer. An end-of-line comment like foo(x ;; note\n y) is attached to y (the next form), not to x (the preceding form). When formatted, the comment appears before y rather than after x. This preserves comment content but changes its visual position. This is an inherent consequence of the "attach to next token" architecture.
sorted-map and sorted-set have no literal syntax in Clojure. When printed, they appear as regular maps/sets. Re-parsing produces PersistentArrayMap/PersistentHashSet, losing the sorted property. This matches Clojure's own limitation.
@f(x) is @(f x), not (@f x)The @ deref prefix applies to the next complete form. @f(x) parses as (deref (f x)) — the deref wraps the call expression f(x). This is correct and consistent with how all prefix operators work in meme: they bind to the next form, including any adjacent call arguments.
The printer accesses the regex pattern via .pattern (JVM) or .-source (CLJS), which returns only the pattern body without flags. If a regex was constructed programmatically with flags (e.g., re-pattern with inline flags), the flags appear in the pattern string itself (e.g., (?i)...) and are preserved. External flag objects are not representable in #"..." literal syntax.
Programmatically constructed symbols containing whitespace, parentheses, or other syntax-significant characters (e.g., (symbol "foo bar")) cannot be faithfully printed and re-parsed. Clojure has no escape syntax for symbols — pr-str returns the same as str for symbols. This means such symbols print as raw text which re-parses as different forms: (symbol "foo\nbar") → foo\nbar → two symbols foo and bar. This matches Clojure's own limitation and affects all Clojure-based syntax tools.
Comments are preserved through the pipeline via :meme-lang/leading-trivia metadata attached to parsed forms. However, Clojure's metadata system only works on types that implement IMeta — symbols, keywords, collections, and records. Primitive values (numbers, strings, booleans, characters, nil, and regex) cannot carry metadata. When a comment appears before a primitive value inside a form (e.g., def(x ;; important\n 42)), the comment is attached to the 42 token during scanning, but is irretrievably lost when the parser resolves the token to a Long.
Comments that survive formatting: those before symbols, keywords, collections, and calls. Comments that are lost: those before numbers, strings, booleans, characters, nil, and regex literals.
This is a known limitation shared by all Clojure-based formatting tools (cljfmt, zprint face similar challenges). A complete fix would require a concrete syntax tree (CST) that preserves all tokens including whitespace, which is a fundamentally different architecture than the current form-based pipeline.
Leading and trailing comments (before/after top-level forms) are always preserved, as they are attached to the forms vector or to forms that support metadata.
The non-breaking space character (U+00A0, NBSP) is rejected as an invalid character in symbols. The invisible-char? predicate catches it, producing an :invalid token. This prevents invisible-character attacks where NBSP (common in web copy-paste) would create symbols that look identical but are semantically different.
The scanner layer (lexical scanlets in meme-lang.lexlets) is a structural scanner — it partitions input into tokens without knowing Clojure's semantic rules. Semantic validation is split between the resolver (meme-lang.resolve) and the CST reader (meme-lang.cst-reader):
#<, #=, #%): The scanner classifies #=foo as :tagged-literal (structurally, it IS # + symbol). Whether = is a reserved dispatch character is a semantic rule enforced downstream.resolve-string, resolve-regex) detects the missing closing delimiter and throws with :incomplete true — enabling the REPL to prompt for continuation.:, trailing :foo:, triple :::foo): The scanner greedily consumes symbol-char? characters; the CST reader validates keyword structure and rejects malformed forms.:invalid token error messages: The pipeline carries :raw content from :invalid tokens into parser errors for actionable diagnostics..mcj — McCarthy's original M-expression syntaxMcCarthy (1960) defined M-expressions as the surface syntax for Lisp. meme implements M-expressions for Clojure with one rule: f(x y) → (f x y). But meme's syntax is a Clojure-flavored dialect — it inherits Clojure's reader macros, dispatch forms, and data literal conventions.
McCarthy's original syntax was different: car[x] used square brackets for application, [p₁ → e₁; … ; pₙ → eₙ] for conditionals, λ[[x]; e] for lambda, label[f; λ[[x]; …]] for recursive binding. A faithful implementation of the original notation — as a guest language with file extension .mcj (McCarthy John, M-expression CloJure) — would honor the historical origin while demonstrating the guest language system.
The infrastructure exists: the unified Pratt parser takes a grammar spec, trivia classification is part of the grammar, and the lang registry supports guest languages. A .mcj lang would supply its own grammar (square-bracket application, semicolon separators, arrow conditionals) and its own scanlets/parselets, reusing the same parser engine (meme.tools.parser) and scanlet builders (meme.tools.lexer).
meme-lang.cst-reader lowers meme's CST to Clojure forms. The pattern is general — every language that uses the Pratt parser needs CST → host-language lowering. The generic parts (tree walking, trivia extraction, discard filtering) could be extracted with language-specific value resolution and node handlers plugged in. Currently there's only one consumer (meme), so the extraction is deferred until a second language (e.g., .mcj) needs it.
The scanner is now fully data-driven. The unified scanlet-parselet Pratt parser (meme.tools.parser) defines scanning as part of the grammar spec. Character dispatch, trivia classification, and structural parsing are all configured via the grammar map. Language-specific consume functions live in meme-lang.lexlets, wrapped into scanlets by the generic builders in meme.tools.lexer. This replaces the previous separate tokenizer that had Clojure-family knowledge baked in.
The grammar spec shape:
{:nud {char → scanlet-fn} ;; character dispatch
:nud-pred [[pred scanlet-fn] ...] ;; predicate-based dispatch
:trivia {char → trivia-fn} ;; trivia classification
:trivia-pred [[pred trivia-fn] ...]
:led [{:char c :bp n ...} ...]} ;; postfix/infix rules
This allows the same parser engine to handle languages with different character vocabularies — each language supplies its own grammar spec.
Meme has one syntactic rule for non-empty lists: head(args...) → (head args...). This rule does not distinguish between function calls and data lists. The distinction is fundamental to understanding meme's design and its relationship to Lisp's homoiconicity principle.
In Clojure, (+ 1 2) and (quote (1 2 3)) produce the same data structure: a list. The difference is purely contextual — the evaluator treats the first element as something to call. At read time, there is no distinction between code and data. This is homoiconicity: code is data, data is code.
Meme inherits this property. +(1 2) and 1(2 3) both produce lists. The reader performs a purely syntactic transform — it does not know or care whether the head is callable. 1(2 3) reads as (1 2 3), which is a valid Clojure form that will fail at eval time ("1 is not IFn"), just as (1 2 3) does in Clojure.
Meme adds: visual distinction between the head and the arguments of a list. In +(1 2), the + is visually separated from the (1 2), making the call structure immediately apparent. This is McCarthy's original M-expression idea from 1960.
Meme does NOT add: a separate notation for "data lists" vs "call lists." In S-expressions, both use (...). In meme, both use head(...). The syntactic surface changed, but the underlying identity of code and data is preserved.
Lisp's answer to "how do I write a list that isn't a call" has always been quote. This carries over to meme:
| Intent | Clojure | Meme | Result |
|---|---|---|---|
| Call + to 1 and 2 | (+ 1 2) | +(1 2) | (+ 1 2) |
| Data list of 1, 2, 3 | '(1 2 3) | '1(2 3) | (quote (1 2 3)) |
| Construct list at runtime | (list 1 2 3) | list(1 2 3) | (list 1 2 3) |
| Empty list | () or '() | () | () |
Quote signals intent: '1(2 3) says "this is data." 1(2 3) without quote says "call 1 with args 2 and 3" — which will fail at eval time, but is structurally valid.
The meme printer converts Clojure forms back to meme syntax. When it encounters a non-empty list (f x y), it emits f(x y). This is correct for all lists:
(+ 1 2) → +(1 2) — looks like a call, is a call. Correct.(1 2 3) → 1(2 3) — looks like a call to 1. Structurally correct; will error at eval.(nil 1) → nil(1) — looks like a call to nil. Valid meme; nil is a legal head.The printer does not inject quote because it cannot know whether the list was intended as a call or as data — that information is not present in the Clojure form. A (1 2 3) produced by (list 1 2 3) at runtime is indistinguishable from one produced by reading '(1 2 3). This is a consequence of homoiconicity, not a printer limitation.
When the list originates from the meme reader with '1(2 3), the reader attaches {:meme-lang/sugar true} metadata to the (quote ...) wrapper. The printer uses this to reproduce the quote sugar. Without that metadata (e.g., for programmatically constructed lists), the printer has no way to know whether quote was originally present.
meme.tools.*The generic parser engine (meme.tools.parser) and render engine (meme.tools.render) are language-agnostic. A language built on these tools inherits meme's call/data conflation unless it adds its own notation. Options:
#list(1 2 3) or <1 2 3>). Requires a new parselet in the grammar and corresponding printer logic. Breaks the "one rule" simplicity.f[x y] for calls, (1 2 3) for data). Requires repurposing existing Clojure delimiters, creating incompatibility.Meme chose option 1 because it preserves three properties simultaneously: syntactic simplicity (one rule), Clojure compatibility (all forms roundtrip), and homoiconicity (code and data have the same structure). The cost is that the printer cannot express intent — only structure.
McCarthy's 1960 M-expressions (f[x; y]) were designed exclusively for function application. There was no M-expression notation for data lists because M-expressions were the "readable" layer over S-expressions — you could always drop down to S-expressions for data. This asymmetry was one reason M-expressions were never adopted: programmers preferred the uniformity of S-expressions.
Meme takes a different approach: M-expression notation IS the S-expression, just rearranged. There is no "drop down" because there is nothing below — f(x y) and (f x y) are the same thing at different stages of the pipeline. This eliminates the M/S asymmetry, but it means the call/data tension is fundamental and cannot be resolved without adding syntax.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |