Router: provider/model registry, circuit breakers, rate limiting, budget tracking, and routing resolution.
Extracted from defaults.clj (provider/model metadata) and llm.clj (routing logic) to provide a single cohesive namespace for all routing concerns.
Router: provider/model registry, circuit breakers, rate limiting, budget tracking, and routing resolution. Extracted from defaults.clj (provider/model metadata) and llm.clj (routing logic) to provide a single cohesive namespace for all routing concerns.
(check-context-limit model messages)(check-context-limit model
messages
{:keys [output-reserve throw? context-limits]
:or {output-reserve DEFAULT_OUTPUT_RESERVE throw? false}})Checks if messages fit within model context limit.
Checks if messages fit within model context limit.
(context-limit model)(context-limit model context-limits)Returns the maximum context window size for a model.
Params:
model - String. Model name.
context-limits - Map, optional. Override map (merged defaults from config).
Returns: Integer. Maximum context tokens.
Returns the maximum context window size for a model. Params: `model` - String. Model name. `context-limits` - Map, optional. Override map (merged defaults from config). Returns: Integer. Maximum context tokens.
(count-and-estimate model messages output-text)(count-and-estimate model
messages
output-text
{:keys [pricing input-tokens api-usage]})Counts tokens and estimates cost in one call.
Counts tokens and estimates cost in one call.
(count-messages model messages)Counts tokens for a chat completion message array.
Counts tokens for a chat completion message array.
(count-tokens model text)Counts tokens for a given text string using the specified model's encoding.
Counts tokens for a given text string using the specified model's encoding.
Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally.
Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally.
Default retry policy for transient HTTP errors.
Default retry policy for transient HTTP errors.
Default HTTP request timeout in milliseconds (5 minutes). Reasoning models (e.g. glm-5-turbo) may need extended time for chain-of-thought.
Default HTTP request timeout in milliseconds (5 minutes). Reasoning models (e.g. glm-5-turbo) may need extended time for chain-of-thought.
(estimate-cost model input-tokens output-tokens)(estimate-cost model input-tokens output-tokens pricing-map)Estimates the cost in USD for a given token count.
Estimates the cost in USD for a given token count.
(format-cost cost)Formats a cost value as a human-readable USD string.
Formats a cost value as a human-readable USD string.
(infer-model-metadata {:keys [name] :as model-map})Returns provider-independent model metadata. Looks up KNOWN_MODEL_METADATA first. Falls back to regex inference for unknown models. Explicit fields in model-map override inferred values.
Returns provider-independent model metadata. Looks up KNOWN_MODEL_METADATA first. Falls back to regex inference for unknown models. Explicit fields in model-map override inferred values.
Per-model static metadata. :reasoning? flags a model whose provider
accepts a reasoning-depth parameter. :reasoning-style (optional) pins the
wire shape to emit — see REASONING_LEVELS keys. When omitted, the style
is inferred from the provider's :api-style (:anthropic → anthropic
thinking, everything else → openai-effort).
Per-model static metadata. `:reasoning?` flags a model whose provider accepts a reasoning-depth parameter. `:reasoning-style` (optional) pins the wire shape to emit — see `REASONING_LEVELS` keys. When omitted, the style is inferred from the provider's `:api-style` (`:anthropic` → anthropic thinking, everything else → openai-effort).
(make-router providers)(make-router providers opts)Creates a router from a vector of provider maps.
Vector order = priority (first provider is highest priority). First model in provider vector = root model. Provider :base-url auto-resolved from KNOWN_PROVIDERS for known IDs. Model metadata auto-inferred from :name and merged with provider-scoped pricing/context. Duplicate provider :id values are a hard error.
opts - Optional map:
:network - {:timeout-ms N :max-retries N ...} router-level network defaults
:tokens - {:check-context? bool :pricing {} :context-limits {}} token defaults
:budget - {:max-tokens N :max-cost N} spend limits (nil = no limit)
:failure-threshold - Int. Failures before circuit opens (default: 5)
:recovery-ms - Int. Ms before open→half-open (default: 60000)
Example: (make-router [{:id :blockether :api-key <key> :models [{:name <model-a>} {:name <model-b>}]} {:id :openai :api-key <key> :models [{:name <model-a>} {:name <model-b>}]}] {:budget {:max-tokens 1000000 :max-cost 5.0}})
Creates a router from a vector of provider maps.
Vector order = priority (first provider is highest priority).
First model in provider vector = root model.
Provider :base-url auto-resolved from KNOWN_PROVIDERS for known IDs.
Model metadata auto-inferred from :name and merged with provider-scoped pricing/context.
Duplicate provider :id values are a hard error.
`opts` - Optional map:
:network - {:timeout-ms N :max-retries N ...} router-level network defaults
:tokens - {:check-context? bool :pricing {} :context-limits {}} token defaults
:budget - {:max-tokens N :max-cost N} spend limits (nil = no limit)
:failure-threshold - Int. Failures before circuit opens (default: 5)
:recovery-ms - Int. Ms before open→half-open (default: 60000)
Example:
(make-router [{:id :blockether :api-key <key>
:models [{:name <model-a>} {:name <model-b>}]}
{:id :openai :api-key <key>
:models [{:name <model-a>} {:name <model-b>}]}]
{:budget {:max-tokens 1000000 :max-cost 5.0}})(max-input-tokens model)(max-input-tokens model {:keys [output-reserve trim-ratio context-limits]})Calculates maximum input tokens for a model, reserving space for output.
Calculates maximum input tokens for a model, reserving space for output.
Best-effort flattened model context limits for legacy token utilities. When a model exists on multiple providers with different contexts, the maximum is used.
Best-effort flattened model context limits for legacy token utilities. When a model exists on multiple providers with different contexts, the maximum is used.
Best-effort flattened model pricing for legacy token utilities. When a model exists on multiple providers, the lowest total pricing is chosen. Provider-aware code should NOT use this — use provider-model-pricing instead.
Best-effort flattened model pricing for legacy token utilities. When a model exists on multiple providers, the lowest total pricing is chosen. Provider-aware code should NOT use this — use provider-model-pricing instead.
(normalize-model model-map)Normalizes a model entry: {:name "gpt-4o"} -> full provider-independent model metadata.
Normalizes a model entry: {:name "gpt-4o"} -> full provider-independent model metadata.
(normalize-provider idx provider-map)Normalizes a provider entry:
Normalizes a provider entry: - resolves :base-url from KNOWN_PROVIDERS if not provided - derives :priority from vector index - derives :root from first model - merges provider-independent model metadata with provider-scoped pricing/context
(normalize-reasoning-level v)Coerce any accepted spelling to a canonical :quick|:balanced|:deep keyword. Accepts:
:reasoning_effort migrations don't break).
Returns nil for unknown input.Coerce any accepted spelling to a canonical :quick|:balanced|:deep keyword.
Accepts:
- :quick / :balanced / :deep (keywords, case-insensitive)
- "quick" / "balanced" / "deep" (strings, case-insensitive)
- OpenAI-style aliases :low→:quick, :medium→:balanced, :high→:deep
(so `:reasoning_effort` migrations don't break).
Returns nil for unknown input.(provider-model-context provider-id model-name)Returns provider-scoped context window for provider/model, falling back to flattened MODEL_CONTEXT_LIMITS.
Returns provider-scoped context window for provider/model, falling back to flattened MODEL_CONTEXT_LIMITS.
(provider-model-entry provider-id model-name)Returns provider-scoped entry {:pricing ... :context ...} for a provider/model, or nil.
Returns provider-scoped entry {:pricing ... :context ...} for a provider/model, or nil.
(provider-model-pricing provider-id model-name)Returns provider-scoped pricing for provider/model, falling back to flattened MODEL_PRICING.
Returns provider-scoped pricing for provider/model, falling back to flattened MODEL_PRICING.
(reasoning-extra-body api-style model-map level)(reasoning-extra-body api-style model-map level {:keys [preserved-thinking?]})Translates an abstract reasoning level into provider-specific extra-body. Returns nil when:
level is nil / unknown:reasoning?Dispatches on the model's :reasoning-style first (explicit pin), falling
back to inference from api-style when the model doesn't declare one.
Callers pass the returned map through merge into their extra-body; silent nil keeps non-reasoning models untouched.
Four-arity form takes an opts map:
:preserved-thinking? — Z.ai-only. Emits clear_thinking: false inside
the :thinking block, asking the server to retain reasoning_content
from previous assistant turns (Preserved Thinking, GLM-5 / GLM-4.7).
Callers using this MUST echo the complete, unmodified reasoning_content
back to the API in subsequent assistant turns, otherwise cache hit
rates and model quality degrade. No-op on non-z.ai reasoning styles
and on the Coding Plan endpoint (which has preserved thinking on
server-side by default, but setting the flag explicitly is harmless).
Translates an abstract reasoning level into provider-specific extra-body.
Returns nil when:
- `level` is nil / unknown
- the selected model is not flagged `:reasoning?`
- the reasoning-style has no mapping in REASONING_LEVELS.
Dispatches on the model's `:reasoning-style` first (explicit pin), falling
back to inference from `api-style` when the model doesn't declare one.
Callers pass the returned map through merge into their extra-body; silent
nil keeps non-reasoning models untouched.
Four-arity form takes an opts map:
`:preserved-thinking?` — Z.ai-only. Emits `clear_thinking: false` inside
the `:thinking` block, asking the server to retain reasoning_content
from previous assistant turns (Preserved Thinking, GLM-5 / GLM-4.7).
Callers using this MUST echo the complete, unmodified reasoning_content
back to the API in subsequent assistant turns, otherwise cache hit
rates and model quality degrade. No-op on non-z.ai reasoning styles
and on the Coding Plan endpoint (which has preserved thinking on
server-side by default, but setting the flag explicitly is harmless).Abstract reasoning levels translated per reasoning-style. Vocabulary is intentionally provider-neutral — callers pass :quick|:balanced|:deep and svar picks the right on-the-wire shape for the selected model.
Sub-key semantics:
:openai-effort → flat top-level :reasoning_effort string.
Used by GPT-5.x, o-series, Gemini 2.5 via OpenAI gateway,
DeepSeek Reasoner, and most OpenAI-compatible reasoners.
:anthropic-thinking → nested :thinking {:type "enabled" :budget_tokens N}.
Budget magnitudes fit within 200k-context Claude 4.x
max_tokens windows; tune if you hit ceilings.
:zai-thinking → binary :thinking {:type "enabled"|"disabled"} on
Z.ai / GLM-4.6+. No budget_tokens — thinking is on/off.
:quick disables, :balanced/:deep enable.
See also :preserved-thinking? below for the
clear_thinking: false flag that keeps reasoning
across assistant turns.
Abstract reasoning levels translated per reasoning-style.
Vocabulary is intentionally provider-neutral — callers pass :quick|:balanced|:deep
and svar picks the right on-the-wire shape for the selected model.
Sub-key semantics:
`:openai-effort` → flat top-level `:reasoning_effort` string.
Used by GPT-5.x, o-series, Gemini 2.5 via OpenAI gateway,
DeepSeek Reasoner, and most OpenAI-compatible reasoners.
`:anthropic-thinking` → nested `:thinking {:type "enabled" :budget_tokens N}`.
Budget magnitudes fit within 200k-context Claude 4.x
max_tokens windows; tune if you hit ceilings.
`:zai-thinking` → binary `:thinking {:type "enabled"|"disabled"}` on
Z.ai / GLM-4.6+. No budget_tokens — thinking is on/off.
`:quick` disables, `:balanced`/`:deep` enable.
See also `:preserved-thinking?` below for the
`clear_thinking: false` flag that keeps reasoning
across assistant turns.(reset-budget! router)Resets the router's token/cost budget counters to zero.
Resets the router's token/cost budget counters to zero.
(reset-provider! router provider-id)Manually resets a provider's circuit breaker to :closed.
Manually resets a provider's circuit breaker to :closed.
(resolve-routing router routing-opts)Resolves :routing opts to prefs for with-provider-fallback. Returns {:prefs prefs-map :error-strategy kw}. Throws on invalid provider/model combinations.
:reasoning in the routing opts (abstract level — :quick/:balanced/:deep
or strings/aliases) implies :require-reasoning? true in prefs, which
filters model selection to :reasoning? true models in resolve-model.
This makes {:optimize :cost :reasoning :deep} pick the cheapest
reasoning-capable model rather than silently dropping :deep when the
cost-cheapest model happens to be non-reasoning.
Resolves :routing opts to prefs for with-provider-fallback.
Returns {:prefs prefs-map :error-strategy kw}.
Throws on invalid provider/model combinations.
`:reasoning` in the routing opts (abstract level — :quick/:balanced/:deep
or strings/aliases) implies `:require-reasoning? true` in prefs, which
filters model selection to `:reasoning? true` models in `resolve-model`.
This makes `{:optimize :cost :reasoning :deep}` pick the cheapest
*reasoning-capable* model rather than silently dropping `:deep` when the
cost-cheapest model happens to be non-reasoning.(router-stats router)Returns cumulative + windowed stats for the router.
Returns cumulative + windowed stats for the router.
(select-provider router prefs)Returns [provider model-map] or nil. Read-only.
Cross-provider ranking: models are scored by :prefer first, provider
priority second. So :optimize :intelligence picks the frontier model
across the whole fleet; ties are broken by provider vector order.
Returns [provider model-map] or nil. Read-only. Cross-provider ranking: models are scored by `:prefer` first, provider priority second. So `:optimize :intelligence` picks the frontier model across the whole fleet; ties are broken by provider vector order.
(truncate-text model text max-tokens)(truncate-text model
text
max-tokens
{:keys [truncation-marker from] :or {from :end}})Truncates text to fit within a token limit. Uses proper tokenization to ensure accurate truncation.
Truncates text to fit within a token limit. Uses proper tokenization to ensure accurate truncation.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |