Recursive Language Model (RLM) for processing arbitrarily large contexts.
RLM enables an LLM to iteratively write and execute Clojure code to examine, filter, and process large contexts that exceed token limits. The LLM writes code that runs in a sandboxed SCI (Small Clojure Interpreter) environment, inspects results, and decides whether to continue iterating or return a final answer.
;; 1. Create environment (holds DB, config, SCI context)
(def env (rlm/create-env {:config llm-config :path "/tmp/my-rlm"}))
;; 2. Ingest documents (can call multiple times)
(rlm/ingest-to-env! env documents)
(rlm/ingest-to-env! env more-documents)
;; 3. Run queries (reuses same env)
(rlm/query-env! env "What is X?")
(rlm/query-env! env "Find Y" {:spec my-spec})
;; 4. Dispose when done
(rlm/dispose-env! env)
Document search:
Learnings:
History:
Recursive Language Model (RLM) for processing arbitrarily large contexts.
RLM enables an LLM to iteratively write and execute Clojure code to examine,
filter, and process large contexts that exceed token limits. The LLM writes
code that runs in a sandboxed SCI (Small Clojure Interpreter) environment,
inspects results, and decides whether to continue iterating or return a final
answer.
## API
```clojure
;; 1. Create environment (holds DB, config, SCI context)
(def env (rlm/create-env {:config llm-config :path "/tmp/my-rlm"}))
;; 2. Ingest documents (can call multiple times)
(rlm/ingest-to-env! env documents)
(rlm/ingest-to-env! env more-documents)
;; 3. Run queries (reuses same env)
(rlm/query-env! env "What is X?")
(rlm/query-env! env "Find Y" {:spec my-spec})
;; 4. Dispose when done
(rlm/dispose-env! env)
```
## Key Features
- Iterative code execution: LLM writes code, sees results, writes more code
- FINAL termination: LLM signals completion by returning {:FINAL result}
- Recursive llm-query: Code can call back to the LLM for sub-tasks
- Sandboxed evaluation: Uses SCI for safe, controlled code execution
- Documents: Complete structure stored exactly as-is:
- Documents with metadata
- Pages with page nodes (paragraphs, headings, images, tables)
- TOC entries
- Learnings: DB-backed meta-insights that persist across sessions
- Spec support: Define output shape, validate FINAL answers
- Auto-refinement: Self-critique loop improves answer quality
## LLM Available Functions (in SCI sandbox)
Document search:
- (list-documents) - List all stored documents
- (get-document doc-id) - Get document metadata
- (search-page-nodes query) - List/filter actual content
- (get-page-node node-id) - Get full page node content
- (list-page-nodes opts) - List page nodes with filters
- (search-toc-entries query) - List/filter table of contents
- (get-toc-entry entry-id) - Get TOC entry
- (list-toc-entries) - List all TOC entries
Learnings:
- (store-learning insight) - Store meta-insight
- (search-learnings query) - Search learnings
- (vote-learning id :useful/:not-useful) - Vote on learning
History:
- (search-history n) - Get recent messages (default 5)
- (get-history n) - Get recent messages (default 10)Dynamic var for max recursion depth. Bound per query-env! call.
Dynamic var for max recursion depth. Bound per query-env! call.
Dynamic context for RLM debug logging. Bind with {:rlm-debug? true :rlm-phase :phase-name :rlm-env-id "..."}.
Dynamic context for RLM debug logging. Bind with {:rlm-debug? true :rlm-phase :phase-name :rlm-env-id "..."}.
(bytes->base64 bs)Converts raw bytes to a base64 string.
Params:
bs - byte[]. Raw bytes.
Returns: String. Base64-encoded representation.
Converts raw bytes to a base64 string. Params: `bs` - byte[]. Raw bytes. Returns: String. Base64-encoded representation.
Schema for storing verified claims extracted from documents. Claims are assertions with citations, confidence scores, and verification verdicts.
Schema for storing verified claims extracted from documents. Claims are assertions with citations, confidence scores, and verification verdicts.
(create-env {:keys [config path]})Creates an RLM environment (component) for document ingestion and querying.
The environment holds:
Usage:
(def env (rlm/create-env {:config llm-config}))
(rlm/register-env-fn! env 'my-fn (fn [x] (* x 2)) "(my-fn x) - Doubles x")
(rlm/register-env-def! env 'MAX_RETRIES 3 "MAX_RETRIES - Max retry attempts")
(rlm/ingest-to-env! env documents)
(rlm/query-env! env "What is X?")
(rlm/dispose-env! env)
Params:
Returns: RLM environment map (component). Pass to register-env-fn!, register-env-def!, ingest-to-env!, query-env!, dispose-env!.
Creates an RLM environment (component) for document ingestion and querying.
The environment holds:
- In-memory store for documents, learnings, and conversation history
- LLM configuration for queries
- SCI sandbox context with custom bindings
Usage:
```clojure
(def env (rlm/create-env {:config llm-config}))
(rlm/register-env-fn! env 'my-fn (fn [x] (* x 2)) "(my-fn x) - Doubles x")
(rlm/register-env-def! env 'MAX_RETRIES 3 "MAX_RETRIES - Max retry attempts")
(rlm/ingest-to-env! env documents)
(rlm/query-env! env "What is X?")
(rlm/dispose-env! env)
```
Params:
- :config - Required. LLM config with :api-key, :base-url, :default-model.
- :path - Optional. Path for persistent DB. If provided, data survives across sessions.
Returns:
RLM environment map (component). Pass to register-env-fn!, register-env-def!, ingest-to-env!, query-env!, dispose-env!.Default maximum depth of nested rlm-query calls. Can be overridden via :max-recursion-depth.
Default maximum depth of nested rlm-query calls. Can be overridden via :max-recursion-depth.
(dispose-env! env)Disposes an RLM environment and releases resources.
For persistent DBs (created with :path), data is preserved. For disposable DBs, all data is deleted.
Params:
env - RLM environment from create-env.
Disposes an RLM environment and releases resources. For persistent DBs (created with :path), data is preserved. For disposable DBs, all data is deleted. Params: `env` - RLM environment from create-env.
Schema for storing PageIndex documents exactly as produced by PageIndex. Matches :document/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec.
Schema for storing PageIndex documents exactly as produced by PageIndex. Matches :document/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec.
Spec for entity extraction output.
Spec for entity extraction output.
Schema for storing generic entities extracted from documents. Entities are the fundamental building blocks: parties, obligations, conditions, terms, clauses, cross-references. Each entity has a type and description.
Schema for storing generic entities extracted from documents. Entities are the fundamental building blocks: parties, obligations, conditions, terms, clauses, cross-references. Each entity has a type and description.
Timeout in milliseconds for code evaluation in SCI sandbox.
Timeout in milliseconds for code evaluation in SCI sandbox.
(generate-qa-env! env)(generate-qa-env!
env
{:keys [count difficulty categories model batch-size verify? debug? parallel
selection-model k-candidates multi-hop? personas]
:or {count 10
difficulty #{:analyze :understand :evaluate :create :apply :remember}
categories #{:inferential :definitional :procedural :comparative
:analytical :factual}
batch-size 5
verify? true
debug? false
parallel 3
k-candidates 1}})Generates question-answer pairs from ingested documents.
Uses a multi-stage pipeline leveraging the RLM's iterative code execution:
Phase 1 - Passage Selection: Explores the corpus structure via TOC and content search, selects diverse passages covering different sections and topics.
Phase 2 - Q&A Generation: For each batch of selected passages, generates grounded question-answer pairs with evidence spans extracted from source text.
Phase 3 - Verification: Each Q&A pair is verified against the source material for groundedness, non-triviality, and self-containedness.
Phase 4 - Deduplication: Near-duplicate questions are removed and diversity across difficulty levels and categories is verified.
Params:
env - RLM environment from create-env with ingested documents.
opts - Map, optional:
Returns: Map with:
Generates question-answer pairs from ingested documents.
Uses a multi-stage pipeline leveraging the RLM's iterative code execution:
Phase 1 - Passage Selection: Explores the corpus structure via TOC and content
search, selects diverse passages covering different sections and topics.
Phase 2 - Q&A Generation: For each batch of selected passages, generates
grounded question-answer pairs with evidence spans extracted from source text.
Phase 3 - Verification: Each Q&A pair is verified against the source material
for groundedness, non-triviality, and self-containedness.
Phase 4 - Deduplication: Near-duplicate questions are removed and diversity
across difficulty levels and categories is verified.
Params:
`env` - RLM environment from create-env with ingested documents.
`opts` - Map, optional:
- :count - Integer. Target number of Q&A pairs (default: 10).
- :difficulty - Set of keywords. Bloom's taxonomy levels to include
(default: #{:remember :understand :apply :analyze :evaluate :create}).
- :categories - Set of keywords. Question types to include
(default: #{:factual :inferential :comparative :analytical :definitional :procedural}).
- :model - String. Override default model.
- :batch-size - Integer. Passages per generation batch (default: 5).
- :parallel - Integer. Number of parallel batch workers for Phase 2 (default: 3).
- :selection-model - String. Fast/cheap model for Phase 1 passage selection (default: :model).
- :k-candidates - Integer. Generate k candidates per passage, keep best (default: 1).
- :multi-hop? - Boolean. Generate cross-section questions from passage pairs (default: false).
- :personas - Set of keywords. Persona styles to rotate across batches for diversity.
Available: :student, :researcher, :practitioner, :examiner, :journalist (default: nil).
- :verify? - Boolean. Run verification phase (default: true).
- :debug? - Boolean. Verbose logging (default: false).
Returns:
Map with:
- :questions - Vector of verified Q&A maps, each with :question, :answer,
:evidence-span, :source-document, :source-page, :source-section,
:difficulty, :category.
- :dropped-questions - Vector of Q&A maps that failed verification.
- :verification-results - Vector of verification result maps.
- :phase-traces - Map of {:selection :generation :verification} traces.
- :stats - Map with :total-generated, :passed-verification, :duplicates-removed,
:final-count, :by-difficulty (counts), :by-category (counts).
- :iterations - Total iterations across all phases.
- :duration-ms - Total execution time.(ingest-to-env! env documents)(ingest-to-env! env documents opts)Ingests PageIndex documents into an RLM environment.
Stores the complete document structure exactly as PageIndex produces it:
Can be called multiple times to add more documents.
Params:
env - RLM environment from create-env.
documents - Vector of PageIndex documents (spec-validated).
opts - Optional. Map with extraction options:
Returns: Vector of ingestion results, one per document: [{:document-id "..." :pages-stored N :nodes-stored N :toc-entries-stored N :entities-extracted N :relationships-extracted N :pages-processed N :extraction-errors N :visual-nodes-scanned N}] (extraction fields only if enabled)
Ingests PageIndex documents into an RLM environment.
Stores the complete document structure exactly as PageIndex produces it:
- Document metadata
- All pages
- All page nodes (paragraphs, headings, images, tables)
- All TOC entries
Can be called multiple times to add more documents.
Params:
`env` - RLM environment from create-env.
`documents` - Vector of PageIndex documents (spec-validated).
`opts` - Optional. Map with extraction options:
- :extract-entities? - Enable entity extraction (default false)
- :extraction-model - Model for extraction (default: env's default-model)
- :max-extraction-pages - Page limit per doc (default 50)
- :max-vision-rescan-nodes - Cap on vision re-scans per doc (default 10)
Returns:
Vector of ingestion results, one per document:
[{:document-id "..." :pages-stored N :nodes-stored N :toc-entries-stored N
:entities-extracted N :relationships-extracted N :pages-processed N
:extraction-errors N :visual-nodes-scanned N}] (extraction fields only if enabled)Schema for storing learnings (meta-insights). Learnings capture HOW to approach problems, not just query→answer pairs.
Voting system:
Schema for storing learnings (meta-insights). Learnings capture HOW to approach problems, not just query→answer pairs. Voting system: - Learnings are voted on after tasks complete (positive/negative) - Learnings with >70% negative votes after 5+ total votes are 'decayed' (filtered from queries) - :applied-count tracks how many times a learning was retrieved
Schema for legal-specific entity attributes. Extends ENTITY_SCHEMA with domain-specific fields for parties, obligations, conditions, etc.
Schema for legal-specific entity attributes. Extends ENTITY_SCHEMA with domain-specific fields for parties, obligations, conditions, etc.
Maximum number of code execution iterations before forcing termination.
Maximum number of code execution iterations before forcing termination.
Schema for storing conversation messages.
Schema for storing conversation messages.
Schema for storing PageIndex page nodes exactly as produced by PageIndex. Matches :page.node/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec. These are the actual content elements: paragraphs, headings, images, tables, etc.
Schema for storing PageIndex page nodes exactly as produced by PageIndex. Matches :page.node/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec. These are the actual content elements: paragraphs, headings, images, tables, etc.
Schema for storing PageIndex pages exactly as produced by PageIndex. Matches :page/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec.
Schema for storing PageIndex pages exactly as produced by PageIndex. Matches :page/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec.
(pprint-trace trace)(pprint-trace trace opts)Pretty-prints an RLM execution trace to stdout for debugging.
Prints the formatted trace to out and returns the formatted string.
Params:
trace - Vector of trace entries from query-env! result.
opts - Map, optional:
Returns: String with formatted trace output (also printed to stdout).
Pretty-prints an RLM execution trace to stdout for debugging. Prints the formatted trace to *out* and returns the formatted string. Params: `trace` - Vector of trace entries from query-env! result. `opts` - Map, optional: - :max-response-length - Truncate LLM response (default: 500). - :max-code-length - Truncate code blocks (default: 300). - :max-result-length - Truncate execution results (default: 200). - :show-stdout? - Show stdout output (default: true). Returns: String with formatted trace output (also printed to stdout).
(print-trace trace)(print-trace trace opts)Prints an RLM execution trace to stdout. Alias for pprint-trace.
Prints an RLM execution trace to stdout. Alias for pprint-trace.
(query-env! env query-str)(query-env! env
query-str
{:keys [context spec model max-iterations max-refinements threshold
refine? learn? max-context-tokens max-recursion-depth
verify? plan? debug?]
:or {max-recursion-depth DEFAULT_RECURSION_DEPTH
plan? false
verify? false
refine? true
max-iterations MAX_ITERATIONS
threshold 0.8
max-refinements 1
debug? false
learn? true}})Runs a query on an RLM environment using iterative LLM code execution.
The LLM can use these functions during execution:
Document search:
History:
Learnings:
Params:
env - RLM environment from create-env.
query-str - String. The question to answer.
opts - Map, optional:
Returns: Map with:
Runs a query on an RLM environment using iterative LLM code execution.
The LLM can use these functions during execution:
Document search:
- (list-documents) - List all stored documents
- (get-document doc-id) - Get document metadata
- (search-page-nodes query) - List/filter actual content
- (get-page-node node-id) - Get full page node content
- (list-page-nodes opts) - List page nodes with filters
- (search-toc-entries query) - List/filter table of contents
- (get-toc-entry entry-id) - Get TOC entry
- (list-toc-entries) - List all TOC entries
History:
- (search-history n) - Get recent messages
- (get-history n) - Get recent messages
Learnings:
- (store-learning insight) - Store meta-insight
- (search-learnings query) - Search learnings
Params:
`env` - RLM environment from create-env.
`query-str` - String. The question to answer.
`opts` - Map, optional:
- :context - Data context to analyze.
- :spec - Output spec for structured answers.
- :model - Override config's default model.
- :max-iterations - Max code iterations (default: 50).
- :max-refinements - Max refine iterations (default: 1).
- :threshold - Min eval score 0.0-1.0 for refinement early stop (default: 0.8).
- :verify? - Enable claim verification with citations (default: false).
- :refine? - Enable refinement (default: true).
- :learn? - Store as example (default: true).
- :max-context-tokens - Token budget for context.
- :debug? - Enable verbose debug logging (default: false). Logs iteration details,
code execution, LLM responses at :info level with :rlm-phase context.
Returns:
Map with:
- :answer - Final (possibly refined) answer string, or parsed spec data.
- :raw-answer - Original answer before refinement.
- :trace - Vector of iteration trace entries, each containing:
{:iteration N
:response "LLM response text"
:executions [{:id 0 :code "(+ 1 2)" :result 3 :stdout "" :error nil :execution-time-ms 5}
{:id 1 :code "(FINAL answer)" :result {:rlm/final true ...} ...}]
:final? boolean}
- :iterations - Total number of iterations executed.
- :eval-scores - Evaluation scores from refinement (if enabled).
- :refinement-count - Number of refinement iterations.
- :duration-ms - Total execution time in milliseconds.
- :history-tokens - Approximate token count of conversation history.
- :status - Only present on failure, e.g. :max-iterations.(register-env-def! env sym value doc-string)Registers a constant/value in the RLM environment's SCI sandbox.
The value becomes available to the LLM during code execution. The doc-string is included in the system prompt so the LLM knows about it.
Params:
env - RLM environment from create-env.
sym - Symbol. The constant name (e.g., 'MAX_RETRIES).
value - Any value. The constant value.
doc-string - String. Documentation for the LLM (e.g., "MAX_RETRIES - Maximum retry attempts").
Returns: The environment (for chaining).
Registers a constant/value in the RLM environment's SCI sandbox. The value becomes available to the LLM during code execution. The doc-string is included in the system prompt so the LLM knows about it. Params: `env` - RLM environment from create-env. `sym` - Symbol. The constant name (e.g., 'MAX_RETRIES). `value` - Any value. The constant value. `doc-string` - String. Documentation for the LLM (e.g., "MAX_RETRIES - Maximum retry attempts"). Returns: The environment (for chaining).
(register-env-fn! env sym f doc-string)Registers a function in the RLM environment's SCI sandbox.
The function becomes available to the LLM during code execution. The doc-string is included in the system prompt so the LLM knows how to use it.
Params:
env - RLM environment from create-env.
sym - Symbol. The function name (e.g., 'fetch-weather).
f - Function. The implementation.
doc-string - String. Documentation for the LLM (e.g., "(fetch-weather city) - Returns weather data").
Returns: The environment (for chaining).
Registers a function in the RLM environment's SCI sandbox. The function becomes available to the LLM during code execution. The doc-string is included in the system prompt so the LLM knows how to use it. Params: `env` - RLM environment from create-env. `sym` - Symbol. The function name (e.g., 'fetch-weather). `f` - Function. The implementation. `doc-string` - String. Documentation for the LLM (e.g., "(fetch-weather city) - Returns weather data"). Returns: The environment (for chaining).
Schema for storing relationships between entities. Relationships capture how entities interact: references, definitions, obligations, conditions, amendments.
Schema for storing relationships between entities. Relationships capture how entities interact: references, definitions, obligations, conditions, amendments.
Map of safe clojure.core functions exposed to SCI sandbox.
Map of safe clojure.core functions exposed to SCI sandbox.
(save-qa! result path)(save-qa!
result
path
{:keys [formats include-dropped? include-stats?]
:or {formats #{:markdown :edn} include-dropped? false include-stats? true}})Saves generate-qa-env! results to EDN and/or Markdown files.
Params:
result - Map. Result from generate-qa-env!.
path - String. Base file path without extension.
opts - Map, optional:
Returns: Map with :files - vector of written file paths.
Saves generate-qa-env! results to EDN and/or Markdown files.
Params:
`result` - Map. Result from generate-qa-env!.
`path` - String. Base file path without extension.
`opts` - Map, optional:
- :formats - Set of keywords. Output formats (default: #{:edn :markdown}).
- :include-dropped? - Boolean. Include dropped questions (default: false).
- :include-stats? - Boolean. Include generation stats (default: true).
Returns:
Map with :files - vector of written file paths.Schema for storing PageIndex TOC entries exactly as produced by PageIndex. Matches :document.toc/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec.
Schema for storing PageIndex TOC entries exactly as produced by PageIndex. Matches :document.toc/* namespace from com.blockether.svar.internal.rlm.internal.pageindex.spec.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |