Router: provider/model registry, circuit breakers, rate limiting, budget tracking, and routing resolution.
Extracted from defaults.clj (provider/model metadata) and llm.clj (routing logic) to provide a single cohesive namespace for all routing concerns.
Router: provider/model registry, circuit breakers, rate limiting, budget tracking, and routing resolution. Extracted from defaults.clj (provider/model metadata) and llm.clj (routing logic) to provide a single cohesive namespace for all routing concerns.
(check-context-limit model messages)(check-context-limit model
messages
{:keys [output-reserve throw? context-limits]
:or {output-reserve DEFAULT_OUTPUT_RESERVE throw? false}})Checks if messages fit within model context limit.
Checks if messages fit within model context limit.
(context-limit model)(context-limit model context-limits)Returns provider-agnostic conservative context window size for a model.
Params:
model - String. Model name.
context-limits - Map, optional. Override map (merged defaults from config).
Returns: Integer. Conservative context tokens.
Returns provider-agnostic conservative context window size for a model. Params: `model` - String. Model name. `context-limits` - Map, optional. Override map (merged defaults from config). Returns: Integer. Conservative context tokens.
(count-and-estimate model messages output-text)(count-and-estimate model
messages
output-text
{:keys [pricing input-tokens api-usage cache-creation-ttl]
:as opts})Counts tokens and estimates cost in one call. Cost map separates uncached input, cached-read input, cache-write input, output, and total.
Counts tokens and estimates cost in one call. Cost map separates uncached input, cached-read input, cache-write input, output, and total.
(count-messages model messages)Counts tokens for a chat completion message array.
Counts tokens for a chat completion message array.
(count-tokens model text)Counts tokens for a given text string using the specified model's encoding.
Counts tokens for a given text string using the specified model's encoding.
Default idle-stream timeout (ms) for streaming HTTP responses. If no
SSE bytes arrive for this long the underlying InputStream is closed
and the call surfaces :svar.core/stream-idle-timeout. Distinct from
DEFAULT_TIMEOUT_MS (whole-request cap): the idle watchdog tolerates
arbitrarily long total durations as long as the stream keeps emitting
bytes (content deltas, SSE : ping keepalives, or even blank
separators — the watchdog resets on every .readLine, so it's
ping-aware for free).
2 minutes (120000 ms) is the considered sweet spot:
120_000 per-request, #959 ships 90s
default with ping-reset).ping interval (15-30 s) for safety.DEFAULT_TIMEOUT_MS doesn't
reliably fire on JDK 25 + HTTP/2 streaming.:idle-timeout-ms nil to disable.Disable per-call: (svar/ask-code! router {... :idle-timeout-ms nil}).
45 s default. Catches genuinely hung streams (no SSE bytes, no keepalive pings) in well under a minute while still allowing the provider to take up to ~40 s between tokens during extended reasoning. The original 120 s default let timeouts blow the whole per-task budget.
Default idle-stream timeout (ms) for streaming HTTP responses. If no
SSE bytes arrive for this long the underlying `InputStream` is closed
and the call surfaces `:svar.core/stream-idle-timeout`. Distinct from
`DEFAULT_TIMEOUT_MS` (whole-request cap): the idle watchdog tolerates
arbitrarily long total durations as long as the stream keeps emitting
bytes (content deltas, SSE `: ping` keepalives, or even blank
separators — the watchdog resets on every `.readLine`, so it's
ping-aware for free).
2 minutes (120000 ms) is the considered sweet spot:
- Matches Anthropic's own SDK proposal (anthropics/anthropic-sdk-
typescript#867 suggests `120_000` per-request, #959 ships 90s
default with ping-reset).
- ~4× Anthropic's published `ping` interval (15-30 s) for safety.
- Catches real hangs (e.g. z.ai glm streams that simply stop
sending body frames after headers) in 2 minutes instead of
forever — the original 5-minute `DEFAULT_TIMEOUT_MS` doesn't
reliably fire on JDK 25 + HTTP/2 streaming.
- Anthropic's documented worst case for legitimate extended
thinking on Opus 4.5 is ~185 s with zero events (see
anthropics/claude-agent-sdk-typescript#44). Callers running
extended-thinking workloads should bump this to 240-300 s, or
pass `:idle-timeout-ms nil` to disable.
Disable per-call: `(svar/ask-code! router {... :idle-timeout-ms nil})`.
45 s default. Catches genuinely hung streams (no SSE bytes, no
keepalive pings) in well under a minute while still allowing the
provider to take up to ~40 s between tokens during extended
reasoning. The original 120 s default let timeouts blow the whole
per-task budget.Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally.
Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally.
Default router-owned 429 policy.
:same-provider-delays-ms — sleep schedule for same-provider retries.
:fallback-after-ms — hard cap on wall time the 429 phase can
consume (measured from the FIRST 429
caught). When the configured delay vector
is exhausted OR elapsed ≥ budget, the
router falls back to the next provider.
Each delay clamps to remaining budget so
the loop never overshoots.
:respect-retry-after? — honor server Retry-After header value
in place of the configured delay; the
clamp to remaining budget still applies.
:fallback-provider? — when budget is exhausted, fall back to
the next provider/model. Set false to
surface the rate-limit error to the
caller instead.
The 60 000 ms (60 s) default tolerates Anthropic / OpenAI / z.ai Retry-After values that can land between 30-60 s under quota pressure on reasoning-heavy workloads, while still bounding the wait so a single user request cannot hang for minutes.
Default router-owned 429 policy.
`:same-provider-delays-ms` — sleep schedule for same-provider retries.
`:fallback-after-ms` — hard cap on wall time the 429 phase can
consume (measured from the FIRST 429
caught). When the configured delay vector
is exhausted OR elapsed ≥ budget, the
router falls back to the next provider.
Each delay clamps to remaining budget so
the loop never overshoots.
`:respect-retry-after?` — honor server `Retry-After` header value
in place of the configured delay; the
clamp to remaining budget still applies.
`:fallback-provider?` — when budget is exhausted, fall back to
the next provider/model. Set false to
surface the rate-limit error to the
caller instead.
The 60 000 ms (60 s) default tolerates Anthropic / OpenAI / z.ai
Retry-After values that can land between 30-60 s under quota
pressure on reasoning-heavy workloads, while still bounding the
wait so a single user request cannot hang for minutes.Default retry policy for transient HTTP errors.
Default retry policy for transient HTTP errors.
Default semantic-stream timeout (ms) for streaming HTTP responses.
Nil by default: disabled unless caller opts in. If enabled and bytes
keep arriving (SSE pings/comments) but no model/progress event arrives
for this long, the stream is closed and surfaced as
:svar.core/stream-semantic-timeout.
Distinct from DEFAULT_IDLE_TIMEOUT_MS: idle watches transport
liveness; semantic watches model progress. Enable per-call with e.g.
:semantic-timeout-ms 180000.
Default semantic-stream timeout (ms) for streaming HTTP responses. Nil by default: disabled unless caller opts in. If enabled and bytes keep arriving (SSE pings/comments) but no model/progress event arrives for this long, the stream is closed and surfaced as `:svar.core/stream-semantic-timeout`. Distinct from `DEFAULT_IDLE_TIMEOUT_MS`: idle watches transport liveness; semantic watches model progress. Enable per-call with e.g. `:semantic-timeout-ms 180000`.
Default HTTP request timeout in milliseconds (5 minutes). Reasoning models (e.g. glm-5-turbo) may need extended time for chain-of-thought.
Default HTTP request timeout in milliseconds (5 minutes). Reasoning models (e.g. glm-5-turbo) may need extended time for chain-of-thought.
Default time-to-first-token timeout (ms) for streaming HTTP responses.
Bounds the wait between sending the HTTP request and receiving response
headers. On fire, raises :svar.core/stream-ttft-timeout and the caller
thread's interrupt unparks the underlying CompletableFuture.get.
30 s default. Tight enough to surface stuck provider connections
inside one iteration (the original 90 s default sometimes wasted a
whole autoresearch iter waiting for headers), generous enough for
real reasoning cold starts — z.ai glm-5.1 has been observed sending
first headers between 8 and 22 s. Disable per-call with
:ttft-timeout-ms nil; pass a larger value for slow reasoning
models with long pre-stream queues.
Default time-to-first-token timeout (ms) for streaming HTTP responses. Bounds the wait between sending the HTTP request and receiving response headers. On fire, raises `:svar.core/stream-ttft-timeout` and the caller thread's interrupt unparks the underlying `CompletableFuture.get`. 30 s default. Tight enough to surface stuck provider connections inside one iteration (the original 90 s default sometimes wasted a whole autoresearch iter waiting for headers), generous enough for real reasoning cold starts — z.ai glm-5.1 has been observed sending first headers between 8 and 22 s. Disable per-call with `:ttft-timeout-ms nil`; pass a larger value for slow reasoning models with long pre-stream queues.
(estimate-cost model input-tokens output-tokens)(estimate-cost model input-tokens output-tokens pricing-map)(estimate-cost model input-tokens output-tokens pricing-map opts)Estimates USD cost with separate uncached input, cached input, cache creation, and output components. Rates are USD per 1M tokens.
Since svar 0.6.0 the canonical usage shape is INCLUSIVE —
:input-tokens is always the TOTAL prompt tokens regardless of
provider (Anthropic-additive raw values are summed at the canonical
normalizer boundary). Cached and cache-creation tokens are
SUBSETS of :input-tokens, so they're subtracted to compute the
uncached-regular portion before pricing. No more
:cache-tokens-in-input? flag — the meaning is uniform.
Estimates USD cost with separate uncached input, cached input, cache creation, and output components. Rates are USD per 1M tokens. Since svar 0.6.0 the canonical usage shape is INCLUSIVE — `:input-tokens` is always the TOTAL prompt tokens regardless of provider (Anthropic-additive raw values are summed at the canonical normalizer boundary). Cached and cache-creation tokens are SUBSETS of `:input-tokens`, so they're subtracted to compute the uncached-regular portion before pricing. No more `:cache-tokens-in-input?` flag — the meaning is uniform.
(format-cost cost)Formats a cost value as a human-readable USD string.
Formats a cost value as a human-readable USD string.
(infer-model-metadata {:keys [name] :as model-map})Returns provider-independent model metadata. Looks up KNOWN_MODEL_METADATA first. Falls back to regex inference for unknown models. Explicit fields in model-map override inferred values.
Returns provider-independent model metadata. Looks up KNOWN_MODEL_METADATA first. Falls back to regex inference for unknown models. Explicit fields in model-map override inferred values.
Per-model static metadata. :reasoning? flags a model whose provider
accepts a reasoning-depth parameter. :reasoning-style (optional) pins the
wire shape to emit — see REASONING_LEVELS keys. When omitted, the style
is inferred from the provider's :api-style (:anthropic → anthropic
thinking, everything else → openai-effort).
Per-model static metadata. `:reasoning?` flags a model whose provider accepts a reasoning-depth parameter. `:reasoning-style` (optional) pins the wire shape to emit — see `REASONING_LEVELS` keys. When omitted, the style is inferred from the provider's `:api-style` (`:anthropic` → anthropic thinking, everything else → openai-effort).
(make-router providers)(make-router providers opts)Creates a router from a vector of provider maps.
Vector order = priority (first provider is highest priority). First model in provider vector = root model. Provider :base-url auto-resolved from KNOWN_PROVIDERS for known IDs. Model metadata auto-inferred from :name and merged with provider-scoped pricing/context. Duplicate provider :id values are a hard error.
opts - Optional map:
:network - {:timeout-ms N :max-retries N ...} router-level network defaults
:tokens - {:check-context? bool :pricing {} :context-limits {}} token defaults
:budget - {:max-tokens N :max-cost N} spend limits (nil = no limit)
:rate-limit - {:same-provider-delays-ms [...] :fallback-after-ms N ...}
:failure-threshold - Int. Failures before circuit opens (default: 5)
:recovery-ms - Int. Ms before open→half-open (default: 60000)
Example: (make-router [{:id :blockether :api-key <key> :models [{:name <model-a>} {:name <model-b>}]} {:id :openai :api-key <key> :models [{:name <model-a>} {:name <model-b>}]}] {:budget {:max-tokens 1000000 :max-cost 5.0}})
Creates a router from a vector of provider maps.
Vector order = priority (first provider is highest priority).
First model in provider vector = root model.
Provider :base-url auto-resolved from KNOWN_PROVIDERS for known IDs.
Model metadata auto-inferred from :name and merged with provider-scoped pricing/context.
Duplicate provider :id values are a hard error.
`opts` - Optional map:
:network - {:timeout-ms N :max-retries N ...} router-level network defaults
:tokens - {:check-context? bool :pricing {} :context-limits {}} token defaults
:budget - {:max-tokens N :max-cost N} spend limits (nil = no limit)
:rate-limit - {:same-provider-delays-ms [...] :fallback-after-ms N ...}
:failure-threshold - Int. Failures before circuit opens (default: 5)
:recovery-ms - Int. Ms before open→half-open (default: 60000)
Example:
(make-router [{:id :blockether :api-key <key>
:models [{:name <model-a>} {:name <model-b>}]}
{:id :openai :api-key <key>
:models [{:name <model-a>} {:name <model-b>}]}]
{:budget {:max-tokens 1000000 :max-cost 5.0}})(max-input-tokens model)(max-input-tokens model {:keys [output-reserve trim-ratio context-limits]})Calculates maximum input tokens for a model, reserving space for output.
Calculates maximum input tokens for a model, reserving space for output.
Best-effort flattened model context limits for legacy token utilities. When a model exists on multiple providers with different contexts, the most conservative context is used. Provider-aware code should use provider-model-context instead.
Best-effort flattened model context limits for legacy token utilities. When a model exists on multiple providers with different contexts, the most conservative context is used. Provider-aware code should use provider-model-context instead.
Best-effort flattened model pricing for legacy token utilities. When a model exists on multiple providers, the lowest total pricing is chosen. Provider-aware code should NOT use this — use provider-model-pricing instead.
Best-effort flattened model pricing for legacy token utilities. When a model exists on multiple providers, the lowest total pricing is chosen. Provider-aware code should NOT use this — use provider-model-pricing instead.
(normalize-model model-map)Normalizes a model entry: {:name "gpt-4o"} -> full provider-independent model metadata.
Normalizes a model entry: {:name "gpt-4o"} -> full provider-independent model metadata.
(normalize-provider idx provider-map)Normalizes a provider entry:
Uses known-provider for the policy lookup so plan-tier aliases
inherit :exclude-models, :llm-headers, :rpm/:tpm, default
models, and any other shared field from their base entry; only the
tier-local overrides (:base-url, ...) win on conflict.
Normalizes a provider entry: - resolves :base-url from KNOWN_PROVIDERS if not provided - derives :priority from vector index - derives :root from first model - merges provider-independent model metadata with provider-scoped pricing/context Uses `known-provider` for the policy lookup so plan-tier aliases inherit `:exclude-models`, `:llm-headers`, `:rpm`/`:tpm`, default models, and any other shared field from their base entry; only the tier-local overrides (`:base-url`, ...) win on conflict.
(normalize-reasoning-level v)Coerce any accepted spelling to a canonical :quick|:balanced|:deep keyword. Accepts:
:reasoning_effort migrations don't break).
Returns nil for unknown input.Coerce any accepted spelling to a canonical :quick|:balanced|:deep keyword.
Accepts:
- :quick / :balanced / :deep (keywords, case-insensitive)
- "quick" / "balanced" / "deep" (strings, case-insensitive)
- OpenAI-style aliases :low→:quick, :medium→:balanced, :high→:deep
(so `:reasoning_effort` migrations don't break).
Returns nil for unknown input.(provider-excluded-model? provider-id model-name)True when a provider-scoped catalog marks a model unavailable.
Provider config may add :exclude-models as exact model names and/or
:min-gpt-version such as [5 3] to hide older GPT family models.
Uses known-provider so plan-tier aliases inherit exclusion lists
from their base entry (e.g. all three Copilot tiers honour the same
:exclude-models #{gpt-4o ...} defined on :github-copilot).
True when a provider-scoped catalog marks a model unavailable.
Provider config may add `:exclude-models` as exact model names and/or
`:min-gpt-version` such as [5 3] to hide older GPT family models.
Uses `known-provider` so plan-tier aliases inherit exclusion lists
from their base entry (e.g. all three Copilot tiers honour the same
`:exclude-models #{gpt-4o ...}` defined on `:github-copilot`).(provider-model-context provider-id model-name)Returns provider-scoped context window for provider/model, falling back to flattened MODEL_CONTEXT_LIMITS.
Returns provider-scoped context window for provider/model, falling back to flattened MODEL_CONTEXT_LIMITS.
(provider-model-entry provider-id model-name)Returns provider-scoped entry for a provider/model, or nil if excluded.
Composition:
:pricing-source if declared, else :id.:api-style, :reasoning-style, :json-object-mode?,
:extra-body, plus any pricing/context overrides).Overlay wins on conflicts. Pricing maps deep-merge so an overlay
can override a single rate without dropping :cache-read /
:cache-write from the catalog.
Returns provider-scoped entry for a provider/model, or nil if excluded.
Composition:
1. Catalog entry from models.dev (pricing, context, modalities,
capability flags, family, knowledge cutoff, release dates) —
looked up under `:pricing-source` if declared, else `:id`.
2. svar overlay from KNOWN_PROVIDER_MODELS (wire/policy keys:
`:api-style`, `:reasoning-style`, `:json-object-mode?`,
`:extra-body`, plus any pricing/context overrides).
Overlay wins on conflicts. Pricing maps deep-merge so an overlay
can override a single rate without dropping `:cache-read` /
`:cache-write` from the catalog.(provider-model-pricing provider-id model-name)Returns provider-scoped pricing for provider/model, falling back to flattened MODEL_PRICING.
Returns provider-scoped pricing for provider/model, falling back to flattened MODEL_PRICING.
(provider-model-visible? provider-id model-name)True when provider-scoped model filters allow model-name.
True when provider-scoped model filters allow `model-name`.
(reasoning-extra-body api-style model-map level)(reasoning-extra-body api-style model-map level {:keys [preserved-thinking?]})Translates an abstract reasoning level into provider-specific extra-body. Returns nil when:
level is nil / unknown:reasoning?Dispatches on the model's :reasoning-style first (explicit pin), falling
back to inference from api-style when the model doesn't declare one.
Callers pass the returned map through merge into their extra-body; silent nil keeps non-reasoning models untouched.
Four-arity form takes an opts map:
:preserved-thinking? — Z.ai-only. Emits clear_thinking: false inside
the :thinking block, asking the server to retain reasoning_content
from previous assistant turns (Preserved Thinking, GLM-5 / GLM-4.7).
Callers using this MUST echo the complete, unmodified reasoning_content
back to the API in subsequent assistant turns, otherwise cache hit
rates and model quality degrade. No-op on non-z.ai reasoning styles
and on the Coding Plan endpoint (which has preserved thinking on
server-side by default, but setting the flag explicitly is harmless).
Translates an abstract reasoning level into provider-specific extra-body.
Returns nil when:
- `level` is nil / unknown
- the selected model is not flagged `:reasoning?`
- the reasoning-style has no mapping in REASONING_LEVELS.
Dispatches on the model's `:reasoning-style` first (explicit pin), falling
back to inference from `api-style` when the model doesn't declare one.
Callers pass the returned map through merge into their extra-body; silent
nil keeps non-reasoning models untouched.
Four-arity form takes an opts map:
`:preserved-thinking?` — Z.ai-only. Emits `clear_thinking: false` inside
the `:thinking` block, asking the server to retain reasoning_content
from previous assistant turns (Preserved Thinking, GLM-5 / GLM-4.7).
Callers using this MUST echo the complete, unmodified reasoning_content
back to the API in subsequent assistant turns, otherwise cache hit
rates and model quality degrade. No-op on non-z.ai reasoning styles
and on the Coding Plan endpoint (which has preserved thinking on
server-side by default, but setting the flag explicitly is harmless).Abstract reasoning levels translated per reasoning-style. Vocabulary is intentionally provider-neutral — callers pass :quick|:balanced|:deep and svar picks the right on-the-wire shape for the selected model.
Sub-key semantics:
:openai-effort → flat top-level :reasoning_effort string.
Used by GPT-5.x, o-series, Gemini 2.5 via OpenAI gateway,
DeepSeek Reasoner, Copilot GPT-5+, and most
OpenAI-compatible reasoners.
:anthropic-thinking → Claude thinking controls.
Claude Opus 4.7 / Opus 4.6 / Sonnet 4.6 use
adaptive thinking + output_config.effort. Older
Claude 4 models use manual budget_tokens.
:zai-thinking → binary :thinking {:type "enabled"|"disabled"} on
Z.ai / GLM-4.6+. No budget_tokens — thinking is on/off.
:quick disables, :balanced/:deep enable.
See also :preserved-thinking? below for the
clear_thinking: false flag that keeps reasoning
across assistant turns.
:server-managed → explicit no-op style for proxies that gate
reasoning server-side and reject (or silently
mis-route) client-supplied reasoning_effort
/ thinking fields. Modeled on pi-ai's
compat.supportsReasoningEffort: false flag
for Copilot Claude / Gemini / Grok: every
entry in REASONING_LEVELS resolves to nil,
so reasoning-extra-body returns nil and the
wire body carries no reasoning field at all.
Without this style, the May 2026 Copilot Claude
switch from :api-style :anthropic (Anthropic
/messages with thinking blocks) to
:openai-compatible-chat (with reasoning_effort)
made Copilot proxy bias Claude into excessive
autonomous reasoning loops — observable on
session 52983a42 / 831cedee as 5K-8K output
tokens per iteration burned on hidden thinking.
Abstract reasoning levels translated per reasoning-style.
Vocabulary is intentionally provider-neutral — callers pass :quick|:balanced|:deep
and svar picks the right on-the-wire shape for the selected model.
Sub-key semantics:
`:openai-effort` → flat top-level `:reasoning_effort` string.
Used by GPT-5.x, o-series, Gemini 2.5 via OpenAI gateway,
DeepSeek Reasoner, Copilot GPT-5+, and most
OpenAI-compatible reasoners.
`:anthropic-thinking` → Claude thinking controls.
Claude Opus 4.7 / Opus 4.6 / Sonnet 4.6 use
adaptive thinking + output_config.effort. Older
Claude 4 models use manual budget_tokens.
`:zai-thinking` → binary `:thinking {:type "enabled"|"disabled"}` on
Z.ai / GLM-4.6+. No budget_tokens — thinking is on/off.
`:quick` disables, `:balanced`/`:deep` enable.
See also `:preserved-thinking?` below for the
`clear_thinking: false` flag that keeps reasoning
across assistant turns.
`:server-managed` → explicit no-op style for proxies that gate
reasoning server-side and reject (or silently
mis-route) client-supplied `reasoning_effort`
/ `thinking` fields. Modeled on pi-ai's
`compat.supportsReasoningEffort: false` flag
for Copilot Claude / Gemini / Grok: every
entry in `REASONING_LEVELS` resolves to nil,
so `reasoning-extra-body` returns nil and the
wire body carries no reasoning field at all.
Without this style, the May 2026 Copilot Claude
switch from `:api-style :anthropic` (Anthropic
/messages with thinking blocks) to
`:openai-compatible-chat` (with `reasoning_effort`)
made Copilot proxy bias Claude into excessive
autonomous reasoning loops — observable on
session 52983a42 / 831cedee as 5K-8K output
tokens per iteration burned on hidden thinking.(reset-budget! router)Resets the router's token/cost budget counters to zero.
Resets the router's token/cost budget counters to zero.
(reset-provider! router provider-id)Manually resets a provider's circuit breaker to :closed.
Manually resets a provider's circuit breaker to :closed.
(resolve-effective-model router)(resolve-effective-model router overrides)Resolves the effective routed model from the router, optionally applying routing overrides.
Returns a model descriptor map (or nil when no provider is available): {:name :reasoning? :provider :api-style :pricing :context :intelligence :speed ...}
overrides (optional map):
:optimize — :cost | :speed | :intelligence (translated to :prefer)
:provider — explicit provider id keyword
:model — explicit model name string
:reasoning — reasoning level keyword (implies reasoning-capable model)
(resolve-effective-model router) ;; root model (resolve-effective-model router {:optimize :cost}) ;; cheapest (resolve-effective-model router {:optimize :intelligence}) ;; frontier (resolve-effective-model router {:provider :openai :model "gpt-5-mini"}) ;; exact
Resolves the effective routed model from the router, optionally applying
routing overrides.
Returns a model descriptor map (or nil when no provider is available):
{:name :reasoning? :provider :api-style :pricing :context :intelligence :speed ...}
`overrides` (optional map):
:optimize — :cost | :speed | :intelligence (translated to :prefer)
:provider — explicit provider id keyword
:model — explicit model name string
:reasoning — reasoning level keyword (implies reasoning-capable model)
(resolve-effective-model router) ;; root model
(resolve-effective-model router {:optimize :cost}) ;; cheapest
(resolve-effective-model router {:optimize :intelligence}) ;; frontier
(resolve-effective-model router {:provider :openai :model "gpt-5-mini"}) ;; exact(resolve-routing router routing-opts)Resolves :routing opts to prefs for with-provider-fallback. Returns {:prefs prefs-map :error-strategy kw}. Throws on invalid provider/model combinations.
:reasoning in the routing opts (abstract level — :quick/:balanced/:deep
or strings/aliases) implies :require-reasoning? true in prefs, which
filters model selection to :reasoning? true models in resolve-model.
This makes {:optimize :cost :reasoning :deep} pick the cheapest
reasoning-capable model rather than silently dropping :deep when the
cost-cheapest model happens to be non-reasoning.
Resolves :routing opts to prefs for with-provider-fallback.
Returns {:prefs prefs-map :error-strategy kw}.
Throws on invalid provider/model combinations.
`:reasoning` in the routing opts (abstract level — :quick/:balanced/:deep
or strings/aliases) implies `:require-reasoning? true` in prefs, which
filters model selection to `:reasoning? true` models in `resolve-model`.
This makes `{:optimize :cost :reasoning :deep}` pick the cheapest
*reasoning-capable* model rather than silently dropping `:deep` when the
cost-cheapest model happens to be non-reasoning.(router-stats router)Returns cumulative + windowed stats for the router.
Returns cumulative + windowed stats for the router.
(select-provider router prefs)Returns [provider model-map] or nil. Read-only.
Cross-provider ranking: models are scored by :prefer first, provider
priority second. So :optimize :intelligence picks the frontier model
across the whole fleet; ties are broken by provider vector order.
Returns [provider model-map] or nil. Read-only. Cross-provider ranking: models are scored by `:prefer` first, provider priority second. So `:optimize :intelligence` picks the frontier model across the whole fleet; ties are broken by provider vector order.
(truncate-text model text max-tokens)(truncate-text model
text
max-tokens
{:keys [truncation-marker from] :or {from :end}})Truncates text to fit within a token limit. Uses proper tokenization to ensure accurate truncation.
Truncates text to fit within a token limit. Uses proper tokenization to ensure accurate truncation.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |