Liking cljdoc? Tell your friends :D

Model Registry And Pricing

The model registry powers sdk/list-models, sdk/model-info, sdk/model-capabilities, sdk/model-context-length, and cost estimation. It is designed to answer useful questions offline while still allowing live provider catalogs and caller overrides.

Lookup Precedence

Higher tiers win:

Tier	Source	Purpose
Override	`sdk/register-model-info`	Caller data for private deployments or newly released models.
Live	Provider `/models` endpoint	What the current account can see right now.
LiteLLM snapshot	`resources/litellm-snapshot.json`	Broad pricing/context coverage for providers the SDK supports.
models.dev	`resources/models-dev-snapshot.json` plus optional cache	Large public model catalog fallback.

The returned model entry carries :model/source so callers can see which tier answered.

Common Queries

(sdk/list-models)
(sdk/list-models :openai)
(sdk/model-info :openai "gpt-4o")
(sdk/model-capabilities :openai "gpt-4o")
(sdk/model-context-length :openai "gpt-4o")

Single-argument lookup scans providers in a stable preference order. Prefer two-argument lookup when the model id is ambiguous across providers.

Live Refresh

(sdk/refresh-models! :provider :openai)
(sdk/refresh-models!)

Live refresh requires provider credentials and only runs for providers with supported model-list endpoints. Failures are returned per provider instead of aborting the whole refresh.

Overrides

Use overrides for private models, self-hosted endpoints, or pricing data that has not reached public catalogs yet:

(sdk/register-model-info
  :acme "magic-7"
  {:model/context-length 32000
   :model/max-output-tokens 4096
   :model/capabilities #{:chat :tools}
   :model/cost {:input-per-million 0.5
                :output-per-million 2.0
                :cache-read-per-million 0.1
                :cache-write-per-million 0.6
                :request-cost 0.005}})

Overrides are in-memory. Applications that need durable custom catalogs should register them during process startup.

Cost Estimation

(sdk/estimate-cost
  :openai "gpt-4o"
  {:usage/input-tokens 1000
   :usage/output-tokens 500
   :usage/cached-input-tokens 200})

Cost results are explicit about uncertainty:

{:cost/usd 0.00625M
 :cost/estimated? true
 :cost/pricing-source "litellm-snapshot"
 :cost/breakdown {...}}

If pricing is unavailable or incomplete for a reported billable dimension, :cost/usd is :unknown. The SDK never substitutes zero for unknown cost and never double counts cached tokens: normalized input tokens are uncached input, while cache reads and cache writes are priced from their own fields when the registry has those rates.

The registry can carry token, cache, request, image, transcription, text-to-speech, and search-query price fields:

{:input-per-million 2.5
 :output-per-million 10.0
 :cache-read-per-million 1.25
 :cache-write-per-million 3.75
 :request-cost 0.005
 :image-per-image 0.04
 :image-per-megapixel 0.02
 :transcription-per-minute 0.006
 :tts-per-million-chars 15.0
 :search-per-call 0.005}

sdk/estimate-cost covers token, cache, and per-request chat costs. Use the modality helpers in llm.sdk.pricing for image, transcription, and text-to-speech attribution when the response path does not include chat-style usage.

Cache Attribution

Provider usage normalizers only emit cache counters when providers actually report them. sdk/complete uses those counters to stamp:

{:cache/status :hit
 :cache/cached-tokens 200
 :cache/cache-write-tokens :unknown}

If a provider is silent about cache statistics, :cache/status is :unknown. This lets applications distinguish "the provider said zero cached tokens" from "the provider did not report cache information."

Snapshot Refresh

The LiteLLM snapshot is rebuilt by fetching a single upstream file (model_prices_and_context_window.json) directly over HTTPS — no local LiteLLM checkout required:

python3 scripts/build_litellm_snapshot.py

To pin a version or work offline, pass an override (a URL, a path to that JSON file, or a directory containing it) as the first argument or via the LITELLM_SOURCE env var:

python3 scripts/build_litellm_snapshot.py /path/to/litellm

The script filters LiteLLM data down to providers this SDK can actually address. It preserves token pricing plus available request, image, transcription, and text-to-speech pricing fields for cost-aware applications.

❮LiteLLM Provider Parity Provider Configuration❯

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close