com.blockether.svar.internal.router

Liking cljdoc? Tell your friends :D

Clojure only.

Router: provider/model registry, circuit breakers, rate limiting, budget tracking, and routing resolution.

Extracted from defaults.clj (provider/model metadata) and llm.clj (routing logic) to provide a single cohesive namespace for all routing concerns.

Router: provider/model registry, circuit breakers, rate limiting, budget tracking,
and routing resolution.

Extracted from defaults.clj (provider/model metadata) and llm.clj (routing logic)
to provide a single cohesive namespace for all routing concerns.

raw docstring

check-context-limit^clj

(check-context-limit model messages)

(check-context-limit model
                     messages
                     {:keys [output-reserve throw? context-limits]
                      :or {output-reserve DEFAULT_OUTPUT_RESERVE throw? false}})

Checks if messages fit within model context limit.

Checks if messages fit within model context limit.

source raw docstring

context-limit^clj

(context-limit model)

(context-limit model context-limits)

Returns provider-agnostic conservative context window size for a model.

Params: model - String. Model name. context-limits - Map, optional. Override map (merged defaults from config).

Returns: Integer. Conservative context tokens.

Returns provider-agnostic conservative context window size for a model.

Params:
`model` - String. Model name.
`context-limits` - Map, optional. Override map (merged defaults from config).

Returns:
Integer. Conservative context tokens.

source raw docstring

count-and-estimate^clj

(count-and-estimate model messages output-text)

(count-and-estimate model
                    messages
                    output-text
                    {:keys [pricing input-tokens api-usage cache-creation-ttl]
                     :as opts})

Counts tokens and estimates cost in one call. Cost map separates uncached input, cached-read input, cache-write input, output, and total.

Counts tokens and estimates cost in one call. Cost map separates
uncached input, cached-read input, cache-write input, output, and total.

source raw docstring

count-messages^clj

(count-messages model messages)

Counts tokens for a chat completion message array.

Counts tokens for a chat completion message array.

source raw docstring

count-tokens^clj

(count-tokens model text)

Counts tokens for a given text string using the specified model's encoding.

Counts tokens for a given text string using the specified model's encoding.

source raw docstring

DEFAULT_IDLE_TIMEOUT_MS^clj

Default idle-stream timeout (ms) for streaming HTTP responses. If no SSE bytes arrive for this long the underlying InputStream is closed and the call surfaces :svar.core/stream-idle-timeout. Distinct from DEFAULT_TIMEOUT_MS (whole-request cap): the idle watchdog tolerates arbitrarily long total durations as long as the stream keeps emitting bytes (content deltas, SSE : ping keepalives, or even blank separators — the watchdog resets on every .readLine, so it's ping-aware for free).

2 minutes (120000 ms) is the considered sweet spot:

Matches Anthropic's own SDK proposal (anthropics/anthropic-sdk- typescript#867 suggests 120_000 per-request, #959 ships 90s default with ping-reset).
~4× Anthropic's published ping interval (15-30 s) for safety.
Catches real hangs (e.g. z.ai glm streams that simply stop sending body frames after headers) in 2 minutes instead of forever — the original 5-minute DEFAULT_TIMEOUT_MS doesn't reliably fire on JDK 25 + HTTP/2 streaming.
Anthropic's documented worst case for legitimate extended thinking on Opus 4.5 is ~185 s with zero events (see anthropics/claude-agent-sdk-typescript#44). Callers running extended-thinking workloads should bump this to 240-300 s, or pass :idle-timeout-ms nil to disable.

Disable per-call: (svar/ask-code! router {... :idle-timeout-ms nil}).

45 s default. Catches genuinely hung streams (no SSE bytes, no keepalive pings) in well under a minute while still allowing the provider to take up to ~40 s between tokens during extended reasoning. The original 120 s default let timeouts blow the whole per-task budget.

Default idle-stream timeout (ms) for streaming HTTP responses. If no
SSE bytes arrive for this long the underlying `InputStream` is closed
and the call surfaces `:svar.core/stream-idle-timeout`. Distinct from
`DEFAULT_TIMEOUT_MS` (whole-request cap): the idle watchdog tolerates
arbitrarily long total durations as long as the stream keeps emitting
bytes (content deltas, SSE `: ping` keepalives, or even blank
separators — the watchdog resets on every `.readLine`, so it's
ping-aware for free).

2 minutes (120000 ms) is the considered sweet spot:
  - Matches Anthropic's own SDK proposal (anthropics/anthropic-sdk-
    typescript#867 suggests `120_000` per-request, #959 ships 90s
    default with ping-reset).
  - ~4× Anthropic's published `ping` interval (15-30 s) for safety.
  - Catches real hangs (e.g. z.ai glm streams that simply stop
    sending body frames after headers) in 2 minutes instead of
    forever — the original 5-minute `DEFAULT_TIMEOUT_MS` doesn't
    reliably fire on JDK 25 + HTTP/2 streaming.
  - Anthropic's documented worst case for legitimate extended
    thinking on Opus 4.5 is ~185 s with zero events (see
    anthropics/claude-agent-sdk-typescript#44). Callers running
    extended-thinking workloads should bump this to 240-300 s, or
    pass `:idle-timeout-ms nil` to disable.

Disable per-call: `(svar/ask-code! router {... :idle-timeout-ms nil})`.

45 s default. Catches genuinely hung streams (no SSE bytes, no
keepalive pings) in well under a minute while still allowing the
provider to take up to ~40 s between tokens during extended
reasoning. The original 120 s default let timeouts blow the whole
per-task budget.

source raw docstring

DEFAULT_OUTPUT_RESERVE^clj

Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally.

Default number of tokens to reserve for model output.
0 means no reservation — let the API handle overflow naturally.

source raw docstring

DEFAULT_RATE_LIMIT_ROUTING^clj

Default router-owned 429 policy.

:same-provider-delays-ms — sleep schedule for same-provider retries. :fallback-after-ms — hard cap on wall time the 429 phase can consume (measured from the FIRST 429 caught). When the configured delay vector is exhausted OR elapsed ≥ budget, the router falls back to the next provider. Each delay clamps to remaining budget so the loop never overshoots. :respect-retry-after? — honor server Retry-After header value in place of the configured delay; the clamp to remaining budget still applies. :fallback-provider? — when budget is exhausted, fall back to the next provider/model. Set false to surface the rate-limit error to the caller instead.

The 60 000 ms (60 s) default tolerates Anthropic / OpenAI / z.ai Retry-After values that can land between 30-60 s under quota pressure on reasoning-heavy workloads, while still bounding the wait so a single user request cannot hang for minutes.

Default router-owned 429 policy.

`:same-provider-delays-ms` — sleep schedule for same-provider retries.
`:fallback-after-ms`       — hard cap on wall time the 429 phase can
                             consume (measured from the FIRST 429
                             caught). When the configured delay vector
                             is exhausted OR elapsed ≥ budget, the
                             router falls back to the next provider.
                             Each delay clamps to remaining budget so
                             the loop never overshoots.
`:respect-retry-after?`    — honor server `Retry-After` header value
                             in place of the configured delay; the
                             clamp to remaining budget still applies.
`:fallback-provider?`      — when budget is exhausted, fall back to
                             the next provider/model. Set false to
                             surface the rate-limit error to the
                             caller instead.

The 60 000 ms (60 s) default tolerates Anthropic / OpenAI / z.ai
Retry-After values that can land between 30-60 s under quota
pressure on reasoning-heavy workloads, while still bounding the
wait so a single user request cannot hang for minutes.

source raw docstring

DEFAULT_RETRY^clj

Default retry policy for transient HTTP errors.

Default retry policy for transient HTTP errors.

source raw docstring

DEFAULT_SEMANTIC_TIMEOUT_MS^clj

Default semantic-stream timeout (ms) for streaming HTTP responses. Nil by default: disabled unless caller opts in. If enabled and bytes keep arriving (SSE pings/comments) but no model/progress event arrives for this long, the stream is closed and surfaced as :svar.core/stream-semantic-timeout.

Distinct from DEFAULT_IDLE_TIMEOUT_MS: idle watches transport liveness; semantic watches model progress. Enable per-call with e.g. :semantic-timeout-ms 180000.

Default semantic-stream timeout (ms) for streaming HTTP responses.
Nil by default: disabled unless caller opts in. If enabled and bytes
keep arriving (SSE pings/comments) but no model/progress event arrives
for this long, the stream is closed and surfaced as
`:svar.core/stream-semantic-timeout`.

Distinct from `DEFAULT_IDLE_TIMEOUT_MS`: idle watches transport
liveness; semantic watches model progress. Enable per-call with e.g.
`:semantic-timeout-ms 180000`.

source raw docstring

DEFAULT_TIMEOUT_MS^clj

Default HTTP request timeout in milliseconds (5 minutes). Reasoning models (e.g. glm-5-turbo) may need extended time for chain-of-thought.

Default HTTP request timeout in milliseconds (5 minutes).
Reasoning models (e.g. glm-5-turbo) may need extended time for chain-of-thought.

source raw docstring

DEFAULT_TTFT_TIMEOUT_MS^clj

Default time-to-first-token timeout (ms) for streaming HTTP responses. Bounds the wait between sending the HTTP request and receiving response headers. On fire, raises :svar.core/stream-ttft-timeout and the caller thread's interrupt unparks the underlying CompletableFuture.get.

30 s default. Tight enough to surface stuck provider connections inside one iteration (the original 90 s default sometimes wasted a whole autoresearch iter waiting for headers), generous enough for real reasoning cold starts — z.ai glm-5.1 has been observed sending first headers between 8 and 22 s. Disable per-call with :ttft-timeout-ms nil; pass a larger value for slow reasoning models with long pre-stream queues.

Default time-to-first-token timeout (ms) for streaming HTTP responses.
Bounds the wait between sending the HTTP request and receiving response
headers. On fire, raises `:svar.core/stream-ttft-timeout` and the caller
thread's interrupt unparks the underlying `CompletableFuture.get`.

30 s default. Tight enough to surface stuck provider connections
inside one iteration (the original 90 s default sometimes wasted a
whole autoresearch iter waiting for headers), generous enough for
real reasoning cold starts — z.ai glm-5.1 has been observed sending
first headers between 8 and 22 s. Disable per-call with
`:ttft-timeout-ms nil`; pass a larger value for slow reasoning
models with long pre-stream queues.

source raw docstring

estimate-cost^clj

(estimate-cost model input-tokens output-tokens)

(estimate-cost model input-tokens output-tokens pricing-map)

(estimate-cost model input-tokens output-tokens pricing-map opts)

Estimates USD cost with separate uncached input, cached input, cache creation, and output components. Rates are USD per 1M tokens.

Since svar 0.6.0 the canonical usage shape is INCLUSIVE — :input-tokens is always the TOTAL prompt tokens regardless of provider (Anthropic-additive raw values are summed at the canonical normalizer boundary). Cached and cache-creation tokens are SUBSETS of :input-tokens, so they're subtracted to compute the uncached-regular portion before pricing. No more :cache-tokens-in-input? flag — the meaning is uniform.

Estimates USD cost with separate uncached input, cached input, cache
creation, and output components. Rates are USD per 1M tokens.

Since svar 0.6.0 the canonical usage shape is INCLUSIVE —
`:input-tokens` is always the TOTAL prompt tokens regardless of
provider (Anthropic-additive raw values are summed at the canonical
normalizer boundary). Cached and cache-creation tokens are
SUBSETS of `:input-tokens`, so they're subtracted to compute the
uncached-regular portion before pricing. No more
`:cache-tokens-in-input?` flag — the meaning is uniform.

source raw docstring

format-cost^clj

(format-cost cost)

Formats a cost value as a human-readable USD string.

Formats a cost value as a human-readable USD string.

source raw docstring

infer-model-metadata^clj

(infer-model-metadata {:keys [name] :as model-map})

Returns provider-independent model metadata. Looks up KNOWN_MODEL_METADATA first. Falls back to regex inference for unknown models. Explicit fields in model-map override inferred values.

Returns provider-independent model metadata.
Looks up KNOWN_MODEL_METADATA first. Falls back to regex inference for unknown models.
Explicit fields in model-map override inferred values.

source raw docstring

KNOWN_MODEL_METADATA^clj

Per-model static metadata. :reasoning? flags a model whose provider accepts a reasoning-depth parameter. :reasoning-style (optional) pins the wire shape to emit — see REASONING_LEVELS keys. When omitted, the style is inferred from the provider's :api-style (:anthropic → anthropic thinking, everything else → openai-effort).

Per-model static metadata. `:reasoning?` flags a model whose provider
accepts a reasoning-depth parameter. `:reasoning-style` (optional) pins the
wire shape to emit — see `REASONING_LEVELS` keys. When omitted, the style
is inferred from the provider's `:api-style` (`:anthropic` → anthropic
thinking, everything else → openai-effort).

source raw docstring

KNOWN_PROVIDER_MODELS^clj

source

KNOWN_PROVIDERS^clj

source

make-router^clj

(make-router providers)

(make-router providers opts)

Creates a router from a vector of provider maps.

Vector order = priority (first provider is highest priority). First model in provider vector = root model. Provider :base-url auto-resolved from KNOWN_PROVIDERS for known IDs. Model metadata auto-inferred from :name and merged with provider-scoped pricing/context. Duplicate provider :id values are a hard error.

opts - Optional map: :network - {:timeout-ms N :max-retries N ...} router-level network defaults :tokens - {:check-context? bool :pricing {} :context-limits {}} token defaults :budget - {:max-tokens N :max-cost N} spend limits (nil = no limit) :rate-limit - {:same-provider-delays-ms [...] :fallback-after-ms N ...} :failure-threshold - Int. Failures before circuit opens (default: 5) :recovery-ms - Int. Ms before open→half-open (default: 60000)

Example: (make-router [{:id :blockether :api-key <key> :models [{:name <model-a>} {:name <model-b>}]} {:id :openai :api-key <key> :models [{:name <model-a>} {:name <model-b>}]}] {:budget {:max-tokens 1000000 :max-cost 5.0}})

Creates a router from a vector of provider maps.

Vector order = priority (first provider is highest priority).
First model in provider vector = root model.
Provider :base-url auto-resolved from KNOWN_PROVIDERS for known IDs.
Model metadata auto-inferred from :name and merged with provider-scoped pricing/context.
Duplicate provider :id values are a hard error.

`opts` - Optional map:
  :network   - {:timeout-ms N :max-retries N ...} router-level network defaults
  :tokens    - {:check-context? bool :pricing {} :context-limits {}} token defaults
  :budget    - {:max-tokens N :max-cost N} spend limits (nil = no limit)
  :rate-limit - {:same-provider-delays-ms [...] :fallback-after-ms N ...}
  :failure-threshold - Int. Failures before circuit opens (default: 5)
  :recovery-ms       - Int. Ms before open→half-open (default: 60000)

Example:
  (make-router [{:id :blockether :api-key <key>
                 :models [{:name <model-a>} {:name <model-b>}]}
                {:id :openai :api-key <key>
                 :models [{:name <model-a>} {:name <model-b>}]}]
               {:budget {:max-tokens 1000000 :max-cost 5.0}})

source raw docstring

max-input-tokens^clj

(max-input-tokens model)

(max-input-tokens model {:keys [output-reserve trim-ratio context-limits]})

Calculates maximum input tokens for a model, reserving space for output.

Calculates maximum input tokens for a model, reserving space for output.

source raw docstring

MODEL_CONTEXT_LIMITS^clj

Best-effort flattened model context limits for legacy token utilities. When a model exists on multiple providers with different contexts, the most conservative context is used. Provider-aware code should use provider-model-context instead.

Best-effort flattened model context limits for legacy token utilities.
When a model exists on multiple providers with different contexts, the most
conservative context is used. Provider-aware code should use
provider-model-context instead.

source raw docstring

MODEL_PRICING^clj

Best-effort flattened model pricing for legacy token utilities. When a model exists on multiple providers, the lowest total pricing is chosen. Provider-aware code should NOT use this — use provider-model-pricing instead.

Best-effort flattened model pricing for legacy token utilities.
When a model exists on multiple providers, the lowest total pricing is chosen.
Provider-aware code should NOT use this — use provider-model-pricing instead.

source raw docstring

normalize-model^clj

(normalize-model model-map)

Normalizes a model entry: {:name "gpt-4o"} -> full provider-independent model metadata.

Normalizes a model entry: {:name "gpt-4o"} -> full provider-independent model metadata.

source raw docstring

normalize-provider^clj

(normalize-provider idx provider-map)

Normalizes a provider entry:

resolves :base-url from KNOWN_PROVIDERS if not provided
derives :priority from vector index
derives :root from first model
merges provider-independent model metadata with provider-scoped pricing/context

Uses known-provider for the policy lookup so plan-tier aliases inherit :exclude-models, :llm-headers, :rpm/:tpm, default models, and any other shared field from their base entry; only the tier-local overrides (:base-url, ...) win on conflict.

Normalizes a provider entry:
 - resolves :base-url from KNOWN_PROVIDERS if not provided
 - derives :priority from vector index
 - derives :root from first model
 - merges provider-independent model metadata with provider-scoped pricing/context

Uses `known-provider` for the policy lookup so plan-tier aliases
inherit `:exclude-models`, `:llm-headers`, `:rpm`/`:tpm`, default
models, and any other shared field from their base entry; only the
tier-local overrides (`:base-url`, ...) win on conflict.

source raw docstring

normalize-reasoning-level^clj

(normalize-reasoning-level v)

Coerce any accepted spelling to a canonical :quick|:balanced|:deep keyword. Accepts:

:quick / :balanced / :deep (keywords, case-insensitive)
"quick" / "balanced" / "deep" (strings, case-insensitive)
OpenAI-style aliases :low→:quick, :medium→:balanced, :high→:deep (so :reasoning_effort migrations don't break). Returns nil for unknown input.

Coerce any accepted spelling to a canonical :quick|:balanced|:deep keyword.
Accepts:
  - :quick / :balanced / :deep (keywords, case-insensitive)
  - "quick" / "balanced" / "deep" (strings, case-insensitive)
  - OpenAI-style aliases :low→:quick, :medium→:balanced, :high→:deep
    (so `:reasoning_effort` migrations don't break).
Returns nil for unknown input.

source raw docstring

provider-excluded-model?^clj

(provider-excluded-model? provider-id model-name)

True when a provider-scoped catalog marks a model unavailable. Provider config may add :exclude-models as exact model names and/or :min-gpt-version such as [5 3] to hide older GPT family models. Uses known-provider so plan-tier aliases inherit exclusion lists from their base entry (e.g. all three Copilot tiers honour the same :exclude-models #{gpt-4o ...} defined on :github-copilot).

True when a provider-scoped catalog marks a model unavailable.
Provider config may add `:exclude-models` as exact model names and/or
`:min-gpt-version` such as [5 3] to hide older GPT family models.
Uses `known-provider` so plan-tier aliases inherit exclusion lists
from their base entry (e.g. all three Copilot tiers honour the same
`:exclude-models #{gpt-4o ...}` defined on `:github-copilot`).

source raw docstring

provider-model-context^clj

(provider-model-context provider-id model-name)

Returns provider-scoped context window for provider/model, falling back to flattened MODEL_CONTEXT_LIMITS.

Returns provider-scoped context window for provider/model, falling back to flattened MODEL_CONTEXT_LIMITS.

source raw docstring

provider-model-entry^clj

(provider-model-entry provider-id model-name)

Returns provider-scoped entry for a provider/model, or nil if excluded.

Composition:

Catalog entry from models.dev (pricing, context, modalities, capability flags, family, knowledge cutoff, release dates) — looked up under :pricing-source if declared, else :id.
svar overlay from KNOWN_PROVIDER_MODELS (wire/policy keys: :api-style, :reasoning-style, :json-object-mode?, :extra-body, plus any pricing/context overrides).

Overlay wins on conflicts. Pricing maps deep-merge so an overlay can override a single rate without dropping :cache-read / :cache-write from the catalog.

Returns provider-scoped entry for a provider/model, or nil if excluded.

Composition:
  1. Catalog entry from models.dev (pricing, context, modalities,
     capability flags, family, knowledge cutoff, release dates) —
     looked up under `:pricing-source` if declared, else `:id`.
  2. svar overlay from KNOWN_PROVIDER_MODELS (wire/policy keys:
     `:api-style`, `:reasoning-style`, `:json-object-mode?`,
     `:extra-body`, plus any pricing/context overrides).

Overlay wins on conflicts. Pricing maps deep-merge so an overlay
can override a single rate without dropping `:cache-read` /
`:cache-write` from the catalog.

source raw docstring

provider-model-pricing^clj

(provider-model-pricing provider-id model-name)

Returns provider-scoped pricing for provider/model, falling back to flattened MODEL_PRICING.

Returns provider-scoped pricing for provider/model, falling back to flattened MODEL_PRICING.

source raw docstring

provider-model-visible?^clj

(provider-model-visible? provider-id model-name)

True when provider-scoped model filters allow model-name.

True when provider-scoped model filters allow `model-name`.

source raw docstring

reasoning-extra-body^clj

(reasoning-extra-body api-style model-map level)

(reasoning-extra-body api-style model-map level {:keys [preserved-thinking?]})

Translates an abstract reasoning level into provider-specific extra-body. Returns nil when:

level is nil / unknown
the selected model is not flagged :reasoning?
the reasoning-style has no mapping in REASONING_LEVELS.

Dispatches on the model's :reasoning-style first (explicit pin), falling back to inference from api-style when the model doesn't declare one.

Callers pass the returned map through merge into their extra-body; silent nil keeps non-reasoning models untouched.

Four-arity form takes an opts map: :preserved-thinking? — Z.ai-only. Emits clear_thinking: false inside the :thinking block, asking the server to retain reasoning_content from previous assistant turns (Preserved Thinking, GLM-5 / GLM-4.7). Callers using this MUST echo the complete, unmodified reasoning_content back to the API in subsequent assistant turns, otherwise cache hit rates and model quality degrade. No-op on non-z.ai reasoning styles and on the Coding Plan endpoint (which has preserved thinking on server-side by default, but setting the flag explicitly is harmless).

Translates an abstract reasoning level into provider-specific extra-body.
Returns nil when:
  - `level` is nil / unknown
  - the selected model is not flagged `:reasoning?`
  - the reasoning-style has no mapping in REASONING_LEVELS.

Dispatches on the model's `:reasoning-style` first (explicit pin), falling
back to inference from `api-style` when the model doesn't declare one.

Callers pass the returned map through merge into their extra-body; silent
nil keeps non-reasoning models untouched.

Four-arity form takes an opts map:
  `:preserved-thinking?` — Z.ai-only. Emits `clear_thinking: false` inside
     the `:thinking` block, asking the server to retain reasoning_content
     from previous assistant turns (Preserved Thinking, GLM-5 / GLM-4.7).
     Callers using this MUST echo the complete, unmodified reasoning_content
     back to the API in subsequent assistant turns, otherwise cache hit
     rates and model quality degrade. No-op on non-z.ai reasoning styles
     and on the Coding Plan endpoint (which has preserved thinking on
     server-side by default, but setting the flag explicitly is harmless).

source raw docstring

REASONING_LEVELS^clj

Abstract reasoning levels translated per reasoning-style. Vocabulary is intentionally provider-neutral — callers pass :quick|:balanced|:deep and svar picks the right on-the-wire shape for the selected model.

Sub-key semantics: :openai-effort → flat top-level :reasoning_effort string. Used by GPT-5.x, o-series, Gemini 2.5 via OpenAI gateway, DeepSeek Reasoner, Copilot GPT-5+, and most OpenAI-compatible reasoners. :anthropic-thinking → Claude thinking controls. Claude Opus 4.7 / Opus 4.6 / Sonnet 4.6 use adaptive thinking + output_config.effort. Older Claude 4 models use manual budget_tokens. :zai-thinking → binary :thinking {:type "enabled"|"disabled"} on Z.ai / GLM-4.6+. No budget_tokens — thinking is on/off. :quick disables, :balanced/:deep enable. See also :preserved-thinking? below for the clear_thinking: false flag that keeps reasoning across assistant turns. :server-managed → explicit no-op style for proxies that gate reasoning server-side and reject (or silently mis-route) client-supplied reasoning_effort / thinking fields. Modeled on pi-ai's compat.supportsReasoningEffort: false flag for Copilot Claude / Gemini / Grok: every entry in REASONING_LEVELS resolves to nil, so reasoning-extra-body returns nil and the wire body carries no reasoning field at all. Without this style, the May 2026 Copilot Claude switch from :api-style :anthropic (Anthropic /messages with thinking blocks) to :openai-compatible-chat (with reasoning_effort) made Copilot proxy bias Claude into excessive autonomous reasoning loops — observable on session 52983a42 / 831cedee as 5K-8K output tokens per iteration burned on hidden thinking.

Abstract reasoning levels translated per reasoning-style.
Vocabulary is intentionally provider-neutral — callers pass :quick|:balanced|:deep
and svar picks the right on-the-wire shape for the selected model.

Sub-key semantics:
  `:openai-effort`      → flat top-level `:reasoning_effort` string.
                          Used by GPT-5.x, o-series, Gemini 2.5 via OpenAI gateway,
                          DeepSeek Reasoner, Copilot GPT-5+, and most
                          OpenAI-compatible reasoners.
  `:anthropic-thinking` → Claude thinking controls.
                          Claude Opus 4.7 / Opus 4.6 / Sonnet 4.6 use
                          adaptive thinking + output_config.effort. Older
                          Claude 4 models use manual budget_tokens.
  `:zai-thinking`       → binary `:thinking {:type "enabled"|"disabled"}` on
                          Z.ai / GLM-4.6+. No budget_tokens — thinking is on/off.
                          `:quick` disables, `:balanced`/`:deep` enable.
                          See also `:preserved-thinking?` below for the
                          `clear_thinking: false` flag that keeps reasoning
                          across assistant turns.
  `:server-managed`     → explicit no-op style for proxies that gate
                          reasoning server-side and reject (or silently
                          mis-route) client-supplied `reasoning_effort`
                          / `thinking` fields. Modeled on pi-ai's
                          `compat.supportsReasoningEffort: false` flag
                          for Copilot Claude / Gemini / Grok: every
                          entry in `REASONING_LEVELS` resolves to nil,
                          so `reasoning-extra-body` returns nil and the
                          wire body carries no reasoning field at all.
                          Without this style, the May 2026 Copilot Claude
                          switch from `:api-style :anthropic` (Anthropic
                          /messages with thinking blocks) to
                          `:openai-compatible-chat` (with `reasoning_effort`)
                          made Copilot proxy bias Claude into excessive
                          autonomous reasoning loops — observable on
                          session 52983a42 / 831cedee as 5K-8K output
                          tokens per iteration burned on hidden thinking.

source raw docstring

reset-budget!^clj

(reset-budget! router)

Resets the router's token/cost budget counters to zero.

Resets the router's token/cost budget counters to zero.

source raw docstring

reset-provider!^clj

(reset-provider! router provider-id)

Manually resets a provider's circuit breaker to :closed.

Manually resets a provider's circuit breaker to :closed.

source raw docstring

resolve-effective-model^clj

(resolve-effective-model router)

(resolve-effective-model router overrides)

Resolves the effective routed model from the router, optionally applying routing overrides.

Returns a model descriptor map (or nil when no provider is available): {:name :reasoning? :provider :api-style :pricing :context :intelligence :speed ...}

overrides (optional map): :optimize — :cost | :speed | :intelligence (translated to :prefer) :provider — explicit provider id keyword :model — explicit model name string :reasoning — reasoning level keyword (implies reasoning-capable model)

(resolve-effective-model router) ;; root model (resolve-effective-model router {:optimize :cost}) ;; cheapest (resolve-effective-model router {:optimize :intelligence}) ;; frontier (resolve-effective-model router {:provider :openai :model "gpt-5-mini"}) ;; exact

Resolves the effective routed model from the router, optionally applying
routing overrides.

Returns a model descriptor map (or nil when no provider is available):
{:name :reasoning? :provider :api-style :pricing :context :intelligence :speed ...}

`overrides` (optional map):
  :optimize  — :cost | :speed | :intelligence (translated to :prefer)
  :provider  — explicit provider id keyword
  :model     — explicit model name string
  :reasoning — reasoning level keyword (implies reasoning-capable model)

(resolve-effective-model router)                                         ;; root model
(resolve-effective-model router {:optimize :cost})                       ;; cheapest
(resolve-effective-model router {:optimize :intelligence})               ;; frontier
(resolve-effective-model router {:provider :openai :model "gpt-5-mini"}) ;; exact

source raw docstring

resolve-routing^clj

(resolve-routing router routing-opts)

Resolves :routing opts to prefs for with-provider-fallback. Returns {:prefs prefs-map :error-strategy kw}. Throws on invalid provider/model combinations.

:reasoning in the routing opts (abstract level — :quick/:balanced/:deep or strings/aliases) implies :require-reasoning? true in prefs, which filters model selection to :reasoning? true models in resolve-model. This makes {:optimize :cost :reasoning :deep} pick the cheapest reasoning-capable model rather than silently dropping :deep when the cost-cheapest model happens to be non-reasoning.

Resolves :routing opts to prefs for with-provider-fallback.
Returns {:prefs prefs-map :error-strategy kw}.
Throws on invalid provider/model combinations.

`:reasoning` in the routing opts (abstract level — :quick/:balanced/:deep
or strings/aliases) implies `:require-reasoning? true` in prefs, which
filters model selection to `:reasoning? true` models in `resolve-model`.
This makes `{:optimize :cost :reasoning :deep}` pick the cheapest
*reasoning-capable* model rather than silently dropping `:deep` when the
cost-cheapest model happens to be non-reasoning.

source raw docstring

router-stats^clj

(router-stats router)

Returns cumulative + windowed stats for the router.

Returns cumulative + windowed stats for the router.

source raw docstring

select-provider^clj

(select-provider router prefs)

Returns [provider model-map] or nil. Read-only.

Cross-provider ranking: models are scored by :prefer first, provider priority second. So :optimize :intelligence picks the frontier model across the whole fleet; ties are broken by provider vector order.

Returns [provider model-map] or nil. Read-only.

Cross-provider ranking: models are scored by `:prefer` first, provider
priority second. So `:optimize :intelligence` picks the frontier model
across the whole fleet; ties are broken by provider vector order.

source raw docstring

truncate-text^clj

(truncate-text model text max-tokens)

(truncate-text model
               text
               max-tokens
               {:keys [truncation-marker from] :or {from :end}})

Truncates text to fit within a token limit. Uses proper tokenization to ensure accurate truncation.

Truncates text to fit within a token limit.
Uses proper tokenization to ensure accurate truncation.

source raw docstring

with-provider-fallback^clj

(with-provider-fallback router prefs f)

source

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close