Date: 2026-04-01 Scope: 62 adversarial hypotheses + LSP structural analysis Tools used: nREPL (clojure-mcp), clojure-lsp, Bash CLI testing Methodology: Generate adversarial hypotheses across 8 categories, execute each via live REPL or CLI, verify findings with follow-up probes
| Category | Hypotheses | Refuted | Confirmed | Partial |
|---|---|---|---|---|
| Tokenizer (H1-H10) | 10 | 7 | 2 | 1 |
| Parser (H11-H20) | 10 | 9 | 1 | 0 |
| Roundtrip/Printer (H21-H30, H38-H44) | 17 | 14 | 1 | 2 |
| Error Handling (H31-H37) | 7 | 4 | 0 | 3 |
| Rewrite Engine (H45-H48) | 4 | 2 | 2 | 0 |
| Lang System (H49-H53) | 5 | 2 | 2 | 1 |
| Security/Robustness (H54-H62) | 9 | 7 | 0 | 2 |
| LSP Static Analysis | 5 tasks | -- | 6 findings | 5 info |
| Total | 62 + LSP | 45 | 8 | 9 |
Refutation rate: 73% -- the codebase is well-defended against most adversarial inputs.
#=, #<, #% accepted as tagged literal prefixes -- produces dangerous Clojure outputHypotheses: H6
Location: src/meme/scan/tokenizer.cljc (tag scanning), src/meme/parse/reader.cljc (tagged literal handling)
The tokenizer classifies #=foo, #<foo, #%foo as :tagged-literal tokens with tags =foo, <foo, %foo. The parser accepts them and the printer emits them verbatim in Clojure output:
meme input: #=foo bar -> clj output: #=foo bar
meme input: #<foo bar -> clj output: #<foo bar
Clojure's reader interprets these differently:
#= triggers the EvalReader -- potential code execution if *read-eval* is true#< produces "Unreadable form" -- the Clojure output is unreadable#% produces "No reader function for tag" -- the Clojure output failsImpact: The meme->clj translation pipeline can produce Clojure text that either (a) cannot be read back, or (b) triggers eval-reader behavior when fed to Clojure's reader. This is a translation correctness bug that could become a security issue in pipelines that convert untrusted .meme input to .clj and then read/eval the output.
Fix: Reject #=, #<, #% at tokenizer or parser level with a clear error, matching Clojure's restrictions on dispatch characters.
::a::b, :::, ::a/ accepted as valid auto-resolve keywords -- produces invalid ClojureHypotheses: H10, H37
Location: src/meme/scan/tokenizer.cljc (keyword scanning)
meme: ::a::b -> clj: ::a::b -> Clojure reader: "Invalid token: ::a::b"
meme: ::: -> clj: ::: -> Clojure reader: "Invalid token: :::"
meme: ::a/b/c -> clj: ::a/b\n\n/c -> silently splits into two forms
The tokenizer does not validate keyword syntax beyond basic character scanning. Multiple consecutive colons, slashes, and other invalid patterns are accepted and propagated through to Clojure output that Clojure's own reader rejects.
Fix: Add keyword validation in the tokenizer: reject :::+, ::.*::, ::.*/.*/, etc.
#() anonymous function literals accepted -- forbidden by ClojureHypothesis: H17
Location: src/meme/parse/reader.cljc (:open-anon-fn handler)
meme: #(+(% #(*(% 2)))) -> clj: #(+ %1 #(* %1 2)) -> Clojure: "Nested #()s are not allowed"
Meme silently accepts nested #() and produces Clojure output that Clojure's reader rejects. The inner % params are conflated with outer params.
Fix: Track #() nesting depth in parser state and reject nested occurrences.
resolve-lang eagerly dereferences @builtin before checking user-langsHypothesis: H49
Location: src/meme/lang.cljc:resolve-lang
(let [user #?(:clj @user-langs :cljs nil)
b #?(:clj @builtin :cljs builtin)] ;; always evaluated
(or (get user n) (get b n) ...))
@builtin is a delay that loads EDN resources from classpath. It is dereferenced unconditionally in the let binding, even when the user-lang map already contains the key. This forces unnecessary classpath I/O on every resolve-lang call and prevents user-langs from being consulted in degraded environments where @builtin would throw.
Fix: Inline @builtin into the or expression so it short-circuits.
parse-form-base is 237 lines -- largest function, maintenance hotspotHypothesis: LSP Task C
Location: src/meme/parse/reader.cljc:336-572
A single case dispatch handling every token type. Each branch is 5-15 lines, but the aggregate size makes review, modification, and isolated testing difficult. Most linters flag functions over 80 lines.
Recommendation: Extract thematic groups (dispatch forms, syntax-quote/unquote, metadata) into named private functions.
run-stages in core.cljc is dead production codeHypothesis: LSP Task A
Location: src/meme/core.cljc:94
Public API function with zero references in any src/ file. Only used in core_test.cljc. It is a thin wrapper around stages/run which is the actual function used by meme->forms.
Recommendation: Remove or mark as ^:no-doc.
Note (2026-04-02): F6 and F14 below were fixed in v2.0.0.
% normalized to %1 in #() -- notation not preservedHypothesis: H27
#(+(% %2)) -> #(+(%1 %2)). Semantically identical, but violates the syntactic transparency principle for this one case. Bare % preference is lost.
Hypothesis: H42
Printer falls through to JVM's #object[...] representation. Cannot be re-parsed. Matches Clojure's own limitation.
\uD800 replaced with ?Hypothesis: H9 Clojure preserves the surrogate char; meme replaces it. Behavioral divergence on invalid Unicode.
Hypothesis: H48
A guard function that throws produces a bare Exception, not a structured ExceptionInfo with rewrite context. Confusing for guest language authors.
Hypothesis: H50
Two langs with {:extension "test"} silently coexist; first registered wins in resolve-by-extension. No warning.
Hypothesis: H47
(match-pattern '{?k ?v} {"hello" 42}) -> nil. Limits rewrite rule expressiveness with string/integer map keys.
#_ discards hit 512 depth limitHypothesis: H56
512 sequential #_ x forms exceed the parser depth limit. Clojure handles 10,000+ iteratively. Unlikely in practice.
version var in core.cljc is completely unusedHypothesis: LSP Task A
Location: src/meme/core.cljc:105
Zero references anywhere. CLI reads version.txt directly.
Note (2026-04-02): Fixed in v2.0.0.
MemeRaw and MemeAutoKeyword records leak from meme->forms APIHypothesis: H52
Inputs like 0xFF, 0377, \u0041, ::foo return internal record types, not plain Clojure values. By design (preserves notation for printer), but surprising for API consumers expecting standard Clojure data.
raw-value / raw-text accessors unused in productionHypothesis: LSP Task A
All production code uses keyword access (:value, :raw) directly on MemeRaw records.
source-context in errors.cljc could be privateHypothesis: LSP Task A
Only consumed within errors.cljc itself.
stages.cljc uses plain ex-info instead of meme-errorHypothesis: LSP Task E Pipeline config errors (nil source, missing tokens) bypass the standard error infrastructure.
requiring-resolve creates invisible dependencyHypothesis: LSP Task B
lang.cljc -> runtime/run.cljc dependency is runtime-only, not reflected in :require. Would fail silently if target moves.
discard-sentinel definitionsHypothesis: LSP Task E
parse/reader.cljc uses identity-based (Object.) sentinel; rewrite/tree.cljc uses value-based ::discarded keyword. Intentionally separate but naming overlap could confuse.
These results speak to the quality of the implementation:
| Defense | Hypotheses Defeated |
|---|---|
| Depth limit at 512 -- clean error, no StackOverflow | H1, H14, H32 |
Thread-safe parsing -- per-invocation volatile! state | H62 |
| Linear-time parsing -- no quadratic blowup on malformed input | H56 |
Syntactic transparency -- 'x/quote(x), @x/deref(x), set ordering, numeric notation ALL preserved | H22, H23, H24, H26, H29, H30 |
| Metadata roundtrip -- survives read->print->re-read | H21, H18 |
Formatter idempotency -- format(format(x)) == format(x) | H39 |
| Flat/canon agreement at infinite width | H44 |
| Width=1 produces valid output | H43 |
| Error locations on ALL tested malformed inputs | H31 |
:incomplete flag accuracy -- correct for all tested cases | H33 |
| No raw exception leaks -- all errors wrapped in ExceptionInfo | H37 (mostly) |
| Rewrite cycle detection at 100 iterations | H45 |
| No splice-variable exponential backtracking | H46 |
| Eval injection blocked by native parser design | H54 |
:: keyword namespace not leaked at parse time | H55 |
| CLI handles: missing files, empty input, binary garbage, shebangs | H57-H61 |
Double discard #_ #_ works correctly | H12 |
| nil/true/false as call heads | H13 |
Spacing rule f(x) vs f (x) strictly enforced | H16 |
| No circular dependencies in source tree | LSP Task B |
The codebase has a 73% adversarial refutation rate across 62 deliberately hostile hypotheses. This is strong. Most of the confirmed findings cluster around one theme: the tokenizer is too permissive on what it accepts as valid tokens (F1, F2, F3). The parser and printer faithfully propagate whatever the tokenizer emits, so garbage-in -> garbage-out through the full pipeline.
The single most impactful improvement would be tightening tokenizer validation for:
#=, #<, #%):::, ::a::b, ::a/b/c)#() trackingThis would address F1, F2, and F3 -- all three MEDIUM+ findings -- at one layer.
Security posture is strong. The native parser architecture provides inherent eval-injection protection (H54). The deferred :: keyword encoding avoids namespace leakage (H55). Thread safety is correct (H62). Resource exhaustion is bounded by the 512 depth limit. The only security-adjacent concern (F1, #= in output) requires a specific attack pattern (untrusted meme -> clj -> Clojure read with *read-eval* true).
62 hypotheses tested. 8 confirmed. 9 partial. 45 refuted. 20 findings catalogued across 4 severity levels.
Can you improve this documentation?Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |