Public API for clojure-llm-sdk. Complete, embed, stream, list-models, capabilities, normalize-usage, estimate-cost, provider registration.
Public API for clojure-llm-sdk. Complete, embed, stream, list-models, capabilities, normalize-usage, estimate-cost, provider registration.
Decoder for the AWS vnd.amazon.eventstream binary frame format used by Bedrock /converse-stream and Kinesis. Spec: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html
Each frame: [prelude (12 bytes)] [headers] [payload] [message-crc (4 bytes)] Prelude: [total-length (4 BE)] [headers-length (4 BE)] [prelude-crc (4 BE)] Each header: [name-len (1)] [name] [type (1)] [value-len (2 BE if variable)] [value]
We skip CRC validation (caller already trusts the AWS connection).
Decoder for the AWS vnd.amazon.eventstream binary frame format used by Bedrock /converse-stream and Kinesis. Spec: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html Each frame: [prelude (12 bytes)] [headers] [payload] [message-crc (4 bytes)] Prelude: [total-length (4 BE)] [headers-length (4 BE)] [prelude-crc (4 BE)] Each header: [name-len (1)] [name] [type (1)] [value-len (2 BE if variable)] [value] We skip CRC validation (caller already trusts the AWS connection).
AWS Signature V4 signing for Bedrock + any other AWS service. Implements the canonical-request → string-to-sign → derived-key → signature flow defined in: https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
Public entry point is sign-request — give it an unsigned request map {:method :url :headers :body} plus credentials + region + service and it returns the same map with Authorization + x-amz-date + x-amz-content-sha256 + x-amz-security-token (when present) injected.
No external AWS SDK dep — uses JDK crypto only.
AWS Signature V4 signing for Bedrock + any other AWS service.
Implements the canonical-request → string-to-sign → derived-key →
signature flow defined in:
https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
Public entry point is sign-request — give it an unsigned request map
{:method :url :headers :body} plus credentials + region + service and
it returns the same map with Authorization + x-amz-date +
x-amz-content-sha256 + x-amz-security-token (when present) injected.
No external AWS SDK dep — uses JDK crypto only.Compatibility aggregate for context caching helpers.
Implementation ownership lives in llm.sdk.cache.markers, llm.sdk.cache.policy, and llm.sdk.cache.request.
Compatibility aggregate for context caching helpers. Implementation ownership lives in llm.sdk.cache.markers, llm.sdk.cache.policy, and llm.sdk.cache.request.
Provider-native context cache marker transforms.
Provider-native context cache marker transforms.
Provider/model cache strategy policy.
Provider/model cache strategy policy.
Request cache option readers.
Request cache option readers.
Model catalog — registry-backed lookups for model metadata.
Every fn here delegates to llm.sdk.registry; the hardcoded catalog atom that previously lived here is gone (all former entries are present in the bundled models.dev snapshot at resources/models-dev-snapshot.json).
Single-arg lookups (get-model, context-length, model-capable?) scan across providers and return the first match — stable for globally unique model ids (gpt-4o, claude-opus-4-7), non-deterministic for ambiguous ids that exist under multiple providers (e.g. a model served by both :openrouter and :openai). Prefer the provider-aware overloads when the id is ambiguous.
Model catalog — registry-backed lookups for model metadata. Every fn here delegates to llm.sdk.registry; the hardcoded catalog atom that previously lived here is gone (all former entries are present in the bundled models.dev snapshot at resources/models-dev-snapshot.json). Single-arg lookups (get-model, context-length, model-capable?) scan across providers and return the first match — stable for globally unique model ids (gpt-4o, claude-opus-4-7), non-deterministic for ambiguous ids that exist under multiple providers (e.g. a model served by both :openrouter and :openai). Prefer the provider-aware overloads when the id is ambiguous.
Driver for embedding requests — the embed counterpart to llm.sdk/complete. Resolves the provider profile, picks up its :profile/embed-transport-constructor, builds the request, sends it, and returns a canonical EmbedResponse.
Providers without an embed transport throw ex-info on call rather than returning nil — surfacing missing capability at the call site is friendlier than letting a downstream NullPointer explode.
Driver for embedding requests — the embed counterpart to llm.sdk/complete. Resolves the provider profile, picks up its :profile/embed-transport-constructor, builds the request, sends it, and returns a canonical EmbedResponse. Providers without an embed transport throw ex-info on call rather than returning nil — surfacing missing capability at the call site is friendlier than letting a downstream NullPointer explode.
Structured error classification. Ported from Hermes error_classifier.py with simplified pipeline.
Structured error classification. Ported from Hermes error_classifier.py with simplified pipeline.
Sequential fallback across (provider, model) pairs.
Try each provider in order with the canonical request. Returns the first successful Response. If every attempt fails, throws ex-info carrying an :attempts vector of {:provider :model :error/...} maps in attempt order — callers can inspect to decide whether to re-raise, surface to UI, etc.
What this is NOT:
All of those are explicitly out of scope for the SDK. They are credential-pool plumbing, the exact kind of work LiteLLM's router.py exists for and that we delegate back to the calling application.
Reference: litellm-ref/router_utils/fallback_event_handlers.py:85 run_async_fallback (shape only — do NOT port the pool plumbing).
Sequential fallback across (provider, model) pairs.
Try each provider in order with the canonical request. Returns the
first successful Response. If every attempt fails, throws ex-info
carrying an :attempts vector of {:provider :model :error/...} maps
in attempt order — callers can inspect to decide whether to
re-raise, surface to UI, etc.
What this is NOT:
- Credential pools / multi-key load balancing
- Cooldown caches / weighted shuffle
- TPM/RPM enforcement / budget routing
- Latency-aware routing / complexity routing
All of those are explicitly out of scope for the SDK. They are
credential-pool plumbing, the exact kind of work LiteLLM's router.py
exists for and that we delegate back to the calling application.
Reference: litellm-ref/router_utils/fallback_event_handlers.py:85
run_async_fallback (shape only — do NOT port the pool plumbing).GCP Application Default Credentials (ADC) resolution for Vertex AI.
Mirrors the order documented at https://cloud.google.com/docs/authentication/application-default-credentials and implemented by the official google-auth client libraries:
gcloud auth application-default login)Credentials files come in two flavours we support:
:service_account — has :private_key + :client_email; we RS256-
sign a JWT and exchange it at
oauth2.googleapis.com/token for an access
token (jwt-bearer grant).
:authorized_user — has :client_id, :client_secret, :refresh_token;
we POST a refresh_token grant to the same
endpoint. This is the format
gcloud auth application-default login writes.
External account (workload identity federation) is not yet supported — those credentials require an STS exchange that varies by source (AWS, Azure, OIDC). Throw a clear error if encountered.
Two convenience layers sit above the proper ADC chain:
These are documented escape hatches; they do not replace ADC.
When none of the layers yield a token, raises ex-info {:error/type :auth/missing-credentials :attempted [...]} naming every source the SDK tried, in order.
GCP Application Default Credentials (ADC) resolution for Vertex AI.
Mirrors the order documented at
https://cloud.google.com/docs/authentication/application-default-credentials
and implemented by the official google-auth client libraries:
1. GOOGLE_APPLICATION_CREDENTIALS env var → credentials file
2. Well-known file at
~/.config/gcloud/application_default_credentials.json
(set by `gcloud auth application-default login`)
3. GCE / Cloud Run / GKE metadata server (when running on GCP)
Credentials files come in two flavours we support:
:service_account — has :private_key + :client_email; we RS256-
sign a JWT and exchange it at
oauth2.googleapis.com/token for an access
token (jwt-bearer grant).
:authorized_user — has :client_id, :client_secret, :refresh_token;
we POST a refresh_token grant to the same
endpoint. This is the format
`gcloud auth application-default login` writes.
External account (workload identity federation) is not yet supported
— those credentials require an STS exchange that varies by source
(AWS, Azure, OIDC). Throw a clear error if encountered.
Two convenience layers sit *above* the proper ADC chain:
- request opts :vertex :access-token (caller override)
- GOOGLE_OAUTH_ACCESS_TOKEN env (pre-resolved bearer)
These are documented escape hatches; they do not replace ADC.
When none of the layers yield a token, raises ex-info
{:error/type :auth/missing-credentials :attempted [...]}
naming every source the SDK tried, in order.Thin, mockable HTTP layer built on hato.
Thin, mockable HTTP layer built on hato.
Driver for image generation requests — the image counterpart to sdk/complete, sdk/embed, sdk/moderate, and sdk/rerank.
Resolves the profile, picks up its :profile/image-transport-constructor, builds and sends the request, returns a canonical ImageGenResponse. Providers without image support throw a clear ex-info.
Driver for image generation requests — the image counterpart to sdk/complete, sdk/embed, sdk/moderate, and sdk/rerank. Resolves the profile, picks up its :profile/image-transport-constructor, builds and sends the request, returns a canonical ImageGenResponse. Providers without image support throw a clear ex-info.
Read-only loader for the LiteLLM-derived pricing/capability snapshot bundled at resources/litellm-snapshot.json.
LiteLLM maintains an actively-curated catalog of ~2.7k model entries keyed by their provider's model id. The bundled snapshot is a filtered subset — only providers we have SDK adapters for, with each entry stripped to the fields llm.sdk.registry uses (context-length, max-output-tokens, capability flags, per-million pricing). To refresh, re-run scripts/build_litellm_snapshot.py.
This tier is a sibling to llm.sdk.models-dev — both contribute to llm.sdk.registry's field-merge. Where they overlap, the merge layer (registry/merge-pair) takes a key-level union and rightmost- wins per field; the registry orders the tiers via its lookup fn.
Read-only loader for the LiteLLM-derived pricing/capability snapshot bundled at resources/litellm-snapshot.json. LiteLLM maintains an actively-curated catalog of ~2.7k model entries keyed by their provider's model id. The bundled snapshot is a filtered subset — only providers we have SDK adapters for, with each entry stripped to the fields llm.sdk.registry uses (context-length, max-output-tokens, capability flags, per-million pricing). To refresh, re-run scripts/build_litellm_snapshot.py. This tier is a sibling to llm.sdk.models-dev — both contribute to llm.sdk.registry's field-merge. Where they overlap, the merge layer (registry/merge-pair) takes a key-level union and rightmost- wins per field; the registry orders the tiers via its lookup fn.
Per-provider /models endpoint fetchers.
Each supported provider has a fetcher that hits its public /models endpoint and returns a vector of normalized ModelEntry maps with :model/source :live-models-api.
The registry layer (llm.sdk.registry) merges these entries with the models.dev breadth registry and a bundled offline snapshot to produce one unified view per (provider, model).
Providers without a public /models endpoint (Codex, Codex-backend, Bedrock, Fake) throw :error :unsupported on fetch - callers should route those through models.dev / snapshot layers only.
Per-provider /models endpoint fetchers. Each supported provider has a fetcher that hits its public /models endpoint and returns a vector of normalized ModelEntry maps with :model/source :live-models-api. The registry layer (llm.sdk.registry) merges these entries with the models.dev breadth registry and a bundled offline snapshot to produce one unified view per (provider, model). Providers without a public /models endpoint (Codex, Codex-backend, Bedrock, Fake) throw :error :unsupported on fetch - callers should route those through models.dev / snapshot layers only.
models.dev breadth registry loader.
Three-tier cache hierarchy mirroring hermes-agent/agent/models_dev.py:
Normalizes models.dev's per-provider tree into ModelEntry maps with :model/source :models-dev so the registry merge layer can compare against live /models fetches and the bundled snapshot uniformly.
models.dev breadth registry loader.
Three-tier cache hierarchy mirroring hermes-agent/agent/models_dev.py:
1. In-memory atom (1h TTL by default)
2. Disk cache at ~/.clojure-llm-sdk/models-dev-cache.json (1h by mtime)
3. Network fetch from https://models.dev/api.json
4. Stale disk cache fallback (network failed, disk exists but old)
5. Bundled snapshot at resources/models-dev-snapshot.json (last resort,
ships with the SDK so offline use works)
Normalizes models.dev's per-provider tree into ModelEntry maps with
:model/source :models-dev so the registry merge layer can compare
against live /models fetches and the bundled snapshot uniformly.Driver for moderation requests — the moderate counterpart to llm.sdk/complete and llm.sdk/embed.
Resolves the provider profile, picks up its :profile/moderation-transport-constructor, builds the request, and returns a canonical ModerationResponse.
Providers without a moderation transport throw ex-info rather than NullPointer.
Driver for moderation requests — the moderate counterpart to llm.sdk/complete and llm.sdk/embed. Resolves the provider profile, picks up its :profile/moderation-transport-constructor, builds the request, and returns a canonical ModerationResponse. Providers without a moderation transport throw ex-info rather than NullPointer.
Pricing lookup + cost estimation, layered on llm.sdk.registry.
Data flow: (sdk/estimate-cost ...) ─► estimate-cost-for-model │ ▼ registry/lookup ─► merged ModelEntry - override tier (pricing from cost map) - live /models tier - models.dev tier (including bundled snapshot)
The hardcoded pricing snapshot that previously lived here folded into the bundled models.dev snapshot at resources/models-dev-snapshot.json (every former entry verified present with current pricing). Callers who need to inject custom pricing should use llm.sdk.registry/register-entry!, which bypasses the public registries.
The PricingEntry record shape is preserved for callers who already consume it — internally we convert ModelEntry's :model/cost map back to this shape at lookup time.
Pricing lookup + cost estimation, layered on llm.sdk.registry.
Data flow:
(sdk/estimate-cost ...) ─► estimate-cost-for-model
│
▼
registry/lookup ─► merged ModelEntry
- override tier (pricing from cost map)
- live /models tier
- models.dev tier (including bundled snapshot)
The hardcoded pricing snapshot that previously lived here folded into
the bundled models.dev snapshot at resources/models-dev-snapshot.json
(every former entry verified present with current pricing). Callers
who need to inject custom pricing should use
llm.sdk.registry/register-entry!, which bypasses the public registries.
The PricingEntry record shape is preserved for callers who already
consume it — internally we convert ModelEntry's :model/cost map back
to this shape at lookup time.Compatibility aggregate for provider registry, auth, and built-ins.
Implementation ownership lives under llm.sdk.provider.registry, llm.sdk.provider.auth, and llm.sdk.provider.builtins.
Compatibility aggregate for provider registry, auth, and built-ins. Implementation ownership lives under llm.sdk.provider.registry, llm.sdk.provider.auth, and llm.sdk.provider.builtins.
Declared per-provider SDK coverage.
This is not marketing metadata. It is an internal contract matrix used by tests to keep provider support honest across request/response shape, SDK surface area, caching, usage metrics, pricing, model listing, auth, errors, and live-smoke coverage.
Declared per-provider SDK coverage. This is not marketing metadata. It is an internal contract matrix used by tests to keep provider support honest across request/response shape, SDK surface area, caching, usage metrics, pricing, model listing, auth, errors, and live-smoke coverage.
Provider auth and runtime profile configuration.
Provider auth and runtime profile configuration.
Built-in provider profile definitions.
Built-in provider profile definitions.
Provider profile registry ownership.
Provider profile registry ownership.
Compatibility shim. Implementation lives in llm.sdk.providers.anthropic.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.anthropic.chat.
Anthropic Messages API transport adapter. Supports thinking blocks, cache_control, tool use, streaming deltas. Preserves provider-specific replay state (reasoning_details, signatures).
Anthropic Messages API transport adapter. Supports thinking blocks, cache_control, tool use, streaming deltas. Preserves provider-specific replay state (reasoning_details, signatures).
Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.converse.
Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.converse.
Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.image.
Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.image.
AWS Bedrock Converse API transport adapter.
Auth: AWS Signature V4 — sdk/complete dispatches on :profile/auth-strategy :aws-sigv4 and signs the request via llm.sdk.aws-sigv4 just before the HTTP send.
Streaming: Bedrock's /converse-stream emits binary event-stream frames (vnd.amazon.eventstream). sdk/complete reads the raw InputStream via llm.sdk.aws-eventstream/frame-seq and hands each parsed frame to parse-stream-event-bedrock as a map.
Model-id mapping: canonical short ids (e.g. claude-sonnet-4-5, nova-pro) are mapped to Bedrock's region-versioned id format (e.g. anthropic.claude-sonnet-4-5-20250101-v1:0); unknown ids pass through verbatim so callers can provide explicit ARNs.
AWS Bedrock Converse API transport adapter. Auth: AWS Signature V4 — sdk/complete dispatches on :profile/auth-strategy :aws-sigv4 and signs the request via llm.sdk.aws-sigv4 just before the HTTP send. Streaming: Bedrock's /converse-stream emits binary event-stream frames (vnd.amazon.eventstream). sdk/complete reads the raw InputStream via llm.sdk.aws-eventstream/frame-seq and hands each parsed frame to parse-stream-event-bedrock as a map. Model-id mapping: canonical short ids (e.g. claude-sonnet-4-5, nova-pro) are mapped to Bedrock's region-versioned id format (e.g. anthropic.claude-sonnet-4-5-20250101-v1:0); unknown ids pass through verbatim so callers can provide explicit ARNs.
Bedrock image-generation adapter (Titan Image Generator + Stability SD3 / SDXL). All use bedrock-runtime /model/{id}/invoke with SigV4. Each model has a different body shape:
amazon.titan-image-generator-v1 / -v2:0 {:taskType "TEXT_IMAGE" :textToImageParams {:text "..."} :imageGenerationConfig {:numberOfImages N :width W :height H :cfgScale 8 :seed 0}} response {:images ["b64", ...]}
stability.stable-diffusion-xl-v1 {:text_prompts [{:text "..." :weight 1.0}] :cfg_scale N :seed N :steps 30} response {:artifacts [{:base64 "..."}]}
We dispatch on a substring match against the model id and route to the matching builder/parser pair.
Bedrock image-generation adapter (Titan Image Generator + Stability
SD3 / SDXL). All use bedrock-runtime /model/{id}/invoke with SigV4.
Each model has a different body shape:
amazon.titan-image-generator-v1 / -v2:0
{:taskType "TEXT_IMAGE"
:textToImageParams {:text "..."}
:imageGenerationConfig {:numberOfImages N :width W :height H :cfgScale 8 :seed 0}}
response {:images ["b64", ...]}
stability.stable-diffusion-xl-v1
{:text_prompts [{:text "..." :weight 1.0}]
:cfg_scale N :seed N :steps 30}
response {:artifacts [{:base64 "..."}]}
We dispatch on a substring match against the model id and route to
the matching builder/parser pair.Bedrock Agent Runtime /rerank adapter.
Canonical SDK rerank requests follow Cohere-style inputs: {model, query, documents, top-n}. Bedrock expects an Agent Runtime request containing queries, sources, and a Bedrock reranking configuration. The request is signed by llm.sdk.rerank via SigV4.
Bedrock Agent Runtime /rerank adapter.
Canonical SDK rerank requests follow Cohere-style inputs:
{model, query, documents, top-n}. Bedrock expects an Agent Runtime
request containing queries, sources, and a Bedrock reranking
configuration. The request is signed by llm.sdk.rerank via SigV4.Compatibility shim. Implementation lives in llm.sdk.providers.codex.responses.
Compatibility shim. Implementation lives in llm.sdk.providers.codex.responses.
OpenAI Responses API (Codex) transport adapter. Covers both the standard OpenAI Responses API (api.openai.com) and the Codex backend (chatgpt.com/backend-api/codex).
For the Codex backend, auth is read from ~/.codex/auth.json (shared with the official OpenAI Codex CLI).
OpenAI Responses API (Codex) transport adapter. Covers both the standard OpenAI Responses API (api.openai.com) and the Codex backend (chatgpt.com/backend-api/codex). For the Codex backend, auth is read from ~/.codex/auth.json (shared with the official OpenAI Codex CLI).
Compatibility shim. Implementation lives in llm.sdk.providers.cohere.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.cohere.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.cohere.embeddings.
Compatibility shim. Implementation lives in llm.sdk.providers.cohere.embeddings.
Compatibility shim. Implementation lives in llm.sdk.providers.cohere.rerank.
Compatibility shim. Implementation lives in llm.sdk.providers.cohere.rerank.
Cohere /v2/chat native transport adapter.
Cohere is OpenAI-compat-ish but differs enough to need its own adapter: it has a typed message-content array, a documents field, a citation_options control, citations on the response, and a streaming event taxonomy with separate content-start / content-delta / content-end plus tool-plan-delta and citation-* events.
Reference: litellm-ref/llms/cohere/chat/v2_transformation.py.
Cohere /v2/chat native transport adapter. Cohere is OpenAI-compat-ish but differs enough to need its own adapter: it has a typed message-content array, a documents field, a citation_options control, citations on the response, and a streaming event taxonomy with separate content-start / content-delta / content-end plus tool-plan-delta and citation-* events. Reference: litellm-ref/llms/cohere/chat/v2_transformation.py.
Cohere embed adapter — POST {base}/embed.
Cohere's wire shape diverges from OpenAI's in three places:
Live smoke is env-gated under COHERE_API_KEY.
Cohere embed adapter — POST {base}/embed.
Cohere's wire shape diverges from OpenAI's in three places:
- Request uses :texts (vector) instead of :input.
- Request carries a required :input_type
(search_document / search_query / classification / clustering)
which lives in canonical request as
:embed/provider-options :input-type. Defaults to
"search_document" when omitted — that's the safest fallback
for general-purpose retrieval.
- Response embeddings live under :embeddings.float (newer API
with multi-format support) or :embeddings (legacy single
format). Usage is in :meta.billed_units.input_tokens.
Live smoke is env-gated under COHERE_API_KEY.Cohere /rerank transport. The wire shape is also used by Jina — both accept {model, query, documents, top_n, return_documents} and return {results [{index, relevance_score, document {text}}]}.
Cohere additionally returns :meta.billed_units.search_units for usage; Jina returns :usage {total_tokens}. Both are surfaced through the canonical :response/usage where possible.
Cohere /rerank transport. The wire shape is also used by Jina —
both accept {model, query, documents, top_n, return_documents}
and return {results [{index, relevance_score, document {text}}]}.
Cohere additionally returns :meta.billed_units.search_units for
usage; Jina returns :usage {total_tokens}. Both are surfaced
through the canonical :response/usage where possible.Compatibility shim. Implementation lives in llm.sdk.providers.elevenlabs.tts.
Compatibility shim. Implementation lives in llm.sdk.providers.elevenlabs.tts.
ElevenLabs TTS adapter — POST /v1/text-to-speech/:voice_id with xi-api-key header. Voice id is part of the URL; model id and text live in the JSON body. Returns audio bytes (mp3 by default).
Reference: litellm-ref/llms/elevenlabs/ + ElevenLabs API docs.
ElevenLabs TTS adapter — POST /v1/text-to-speech/:voice_id with xi-api-key header. Voice id is part of the URL; model id and text live in the JSON body. Returns audio bytes (mp3 by default). Reference: litellm-ref/llms/elevenlabs/ + ElevenLabs API docs.
Compatibility shim. Implementation lives in llm.sdk.providers.fake.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.fake.chat.
Fake/test provider that returns deterministic responses. Conforms to the Transport protocol.
Fake/test provider that returns deterministic responses. Conforms to the Transport protocol.
Compatibility shim. Implementation lives in llm.sdk.providers.gemini.native.
Compatibility shim. Implementation lives in llm.sdk.providers.gemini.native.
Vertex AI Imagen 3 / 4 image-generation adapter.
Endpoint: POST {host}/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:predict Body: {:instances [{:prompt "..."}] :parameters {:sampleCount N :aspectRatio "1:1" :seed ...}} Response: {:predictions [{:bytesBase64Encoded "..." :mimeType "image/png"}]}
Auth: same GCP OAuth as vertex-gemini — token from :request provider-options.vertex.access-token or GOOGLE_OAUTH_ACCESS_TOKEN.
Models surfaced under :vertex-imagen include imagen-3.0-generate-002, imagen-3.0-fast-generate-001, imagen-4.0-generate-001.
Vertex AI Imagen 3 / 4 image-generation adapter.
Endpoint:
POST {host}/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:predict
Body:
{:instances [{:prompt "..."}]
:parameters {:sampleCount N :aspectRatio "1:1" :seed ...}}
Response:
{:predictions [{:bytesBase64Encoded "..." :mimeType "image/png"}]}
Auth: same GCP OAuth as vertex-gemini — token from
:request provider-options.vertex.access-token or
GOOGLE_OAUTH_ACCESS_TOKEN.
Models surfaced under :vertex-imagen include imagen-3.0-generate-002,
imagen-3.0-fast-generate-001, imagen-4.0-generate-001.Gemini Native API transport adapter. Handles thought signatures, streaming deltas, safety metadata. Preserves provider-specific replay state.
Gemini Native API transport adapter. Handles thought signatures, streaming deltas, safety metadata. Preserves provider-specific replay state.
Vertex AI Gemini transport adapter.
Builds on Gemini native with different auth (GCP OAuth) and endpoint
structure. Auth resolution follows the standard GCP ADC chain via
llm.sdk.gcp-auth: request opts → GOOGLE_OAUTH_ACCESS_TOKEN env →
gcloud auth print-access-token → GOOGLE_APPLICATION_CREDENTIALS
service-account JSON (RS256-signed JWT exchanged at
oauth2.googleapis.com/token).
Project resolution: request opts → profile quirks → GOOGLE_CLOUD_PROJECT env → SA JSON project_id.
Vertex AI Gemini transport adapter. Builds on Gemini native with different auth (GCP OAuth) and endpoint structure. Auth resolution follows the standard GCP ADC chain via llm.sdk.gcp-auth: request opts → GOOGLE_OAUTH_ACCESS_TOKEN env → `gcloud auth print-access-token` → GOOGLE_APPLICATION_CREDENTIALS service-account JSON (RS256-signed JWT exchanged at oauth2.googleapis.com/token). Project resolution: request opts → profile quirks → GOOGLE_CLOUD_PROJECT env → SA JSON project_id.
Compatibility shim. Implementation lives in llm.sdk.providers.ollama.native.
Compatibility shim. Implementation lives in llm.sdk.providers.ollama.native.
Native Ollama adapter — /api/chat (chat) and /api/embed (embeddings).
Ollama also exposes an OpenAI-compat /v1/chat/completions endpoint that the existing :ollama profile (registered) targets. This namespace registers a sibling :ollama-native profile for callers who want the native shape — older Ollama versions, vision input via the native :images field, or workflows that need the native :options keys (e.g. :num_ctx, :num_predict, :mirostat).
Streaming: Ollama uses NDJSON (one JSON object per line), NOT SSE. We re-use the http/sse-request line reader and parse each line as a raw JSON object instead of stripping a 'data: ' prefix.
Native Ollama adapter — /api/chat (chat) and /api/embed (embeddings). Ollama also exposes an OpenAI-compat /v1/chat/completions endpoint that the existing :ollama profile (registered) targets. This namespace registers a sibling :ollama-native profile for callers who want the native shape — older Ollama versions, vision input via the native :images field, or workflows that need the native :options keys (e.g. :num_ctx, :num_predict, :mirostat). Streaming: Ollama uses NDJSON (one JSON object per line), NOT SSE. We re-use the http/sse-request line reader and parse each line as a raw JSON object instead of stripping a 'data: ' prefix.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.chat.
Data-only OpenAI-compatible provider alias specs.
These providers share the OpenAI chat-completions wire shape. Adapter code may still apply provider quirks from the profile, but the registry should not need one hand-written register-provider call per alias.
Data-only OpenAI-compatible provider alias specs. These providers share the OpenAI chat-completions wire shape. Adapter code may still apply provider quirks from the profile, but the registry should not need one hand-written register-provider call per alias.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.embeddings.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.embeddings.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.image.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.image.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.moderation.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.moderation.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.speak.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.speak.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.transcribe.
Compatibility shim. Implementation lives in llm.sdk.providers.openai.transcribe.
OpenAI audio provider family namespace.
OpenAI audio provider family namespace.
OpenAI Chat Completions transport adapter. Covers OpenAI, OpenRouter, DeepSeek, and other OpenAI-compatible providers.
OpenAI Chat Completions transport adapter. Covers OpenAI, OpenRouter, DeepSeek, and other OpenAI-compatible providers.
OpenAI embeddings adapter — POST {base}/embeddings.
Same auth and base-url plumbing as the chat adapter; we share the profile, just register an additional :profile/embed-transport- constructor on it. Other OpenAI-compat hosts that offer embeddings (Mistral, Together, Voyage, Jina, etc.) can reuse this transport by attaching the same constructor.
OpenAI embeddings adapter — POST {base}/embeddings.
Same auth and base-url plumbing as the chat adapter; we share the
profile, just register an additional :profile/embed-transport-
constructor on it. Other OpenAI-compat hosts that offer embeddings
(Mistral, Together, Voyage, Jina, etc.) can reuse this transport by
attaching the same constructor.OpenAI image generation adapter.
POST {base}/images/generations. Covers DALL-E 3, DALL-E 2, and the gpt-image-1 family. The wire body differs subtly across them (gpt-image-1 takes :quality :low|:medium|:high|:auto and returns b64_json only; DALL-E 3 takes :quality :standard|:hd and :style :vivid|:natural). The adapter passes canonical fields straight through — provider-specific values are the caller's responsibility, and the same provider-options :extra_body hatch as elsewhere covers anything we haven't surfaced.
OpenAI image generation adapter.
POST {base}/images/generations. Covers DALL-E 3, DALL-E 2, and
the gpt-image-1 family. The wire body differs subtly across them
(gpt-image-1 takes :quality :low|:medium|:high|:auto and returns
b64_json only; DALL-E 3 takes :quality :standard|:hd and :style
:vivid|:natural). The adapter passes canonical fields straight
through — provider-specific values are the caller's responsibility,
and the same provider-options :extra_body hatch as elsewhere
covers anything we haven't surfaced.OpenAI Moderations adapter.
POST {base}/moderations. omni-moderation-latest (the default since Nov 2024) accepts multi-modal input — a vector of {:type :text|:image_url} maps as well as plain strings. text-moderation-* models are text-only.
Response shape per the OpenAI Moderations API: {:id :model :results [{:flagged bool :categories {category-name bool} :category_scores {category-name float} :category_applied_input_types {category-name ["text"|"image"]}}]}
OpenAI Moderations adapter.
POST {base}/moderations. omni-moderation-latest (the default since
Nov 2024) accepts multi-modal input — a vector of {:type :text|:image_url}
maps as well as plain strings. text-moderation-* models are
text-only.
Response shape per the OpenAI Moderations API:
{:id :model
:results [{:flagged bool
:categories {category-name bool}
:category_scores {category-name float}
:category_applied_input_types {category-name ["text"|"image"]}}]}OpenAI /audio/speech adapter — POST {model, voice, input, response_format} returns raw audio bytes.
OpenAI /audio/speech adapter — POST {model, voice, input, response_format}
returns raw audio bytes.OpenAI /audio/transcriptions adapter. Wire shape is shared by Groq's /openai/v1/audio/transcriptions endpoint (same field names, same verbose_json output), so the same transport class powers both profiles.
OpenAI /audio/transcriptions adapter. Wire shape is shared by Groq's /openai/v1/audio/transcriptions endpoint (same field names, same verbose_json output), so the same transport class powers both profiles.
Compatibility shim. Implementation lives in llm.sdk.providers.openrouter.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.openrouter.chat.
OpenRouter transport adapter. Builds on OpenAI Chat Completions with OpenRouter-specific quirks:
OpenRouter transport adapter. Builds on OpenAI Chat Completions with OpenRouter-specific quirks: - provider preferences routing in extra_body - Pareto Code router plugin - reasoning config in extra_body (not top-level) - special model naming and error handling.
OpenRouter image generation transport.
OpenRouter image models generate images through chat completions, not OpenAI's /images/generations endpoint. This adapter mirrors that wire shape and extracts images from choices[].message.images[].image_url.url.
OpenRouter image generation transport. OpenRouter image models generate images through chat completions, not OpenAI's /images/generations endpoint. This adapter mirrors that wire shape and extracts images from choices[].message.images[].image_url.url.
Compatibility shim. Implementation lives in llm.sdk.providers.perplexity.chat.
Compatibility shim. Implementation lives in llm.sdk.providers.perplexity.chat.
Perplexity transport — OpenAI-shape body + citation/search-results surfacing.
Request building is identical to openai-chat. Response parsing extends the OpenAI parser with two extractions:
Usage normalization delegates to normalize-openai-usage, which already picks up Perplexity's :citation_tokens and :num_search_queries when present.
Streaming: the final SSE chunk on /chat/completions carries :citations alongside :usage and :finish_reason. parse-stream-event returns a vector of events in that case — sdk/complete flattens multi-event return values.
Perplexity transport — OpenAI-shape body + citation/search-results
surfacing.
Request building is identical to openai-chat. Response parsing
extends the OpenAI parser with two extractions:
- :search_results [{:url :title :snippet}, ...] → richer
CitationPart per result
- :citations ["url", ...] → URL-only
CitationPart when search_results isn't present
Usage normalization delegates to normalize-openai-usage, which
already picks up Perplexity's :citation_tokens and
:num_search_queries when present.
Streaming: the final SSE chunk on /chat/completions carries
:citations alongside :usage and :finish_reason. parse-stream-event
returns a vector of events in that case — sdk/complete flattens
multi-event return values.Compatibility shim. Implementation lives in llm.sdk.providers.gemini.vertex.
Compatibility shim. Implementation lives in llm.sdk.providers.gemini.vertex.
Compatibility shim. Implementation lives in llm.sdk.providers.gemini.imagen.
Compatibility shim. Implementation lives in llm.sdk.providers.gemini.imagen.
Compatibility shim. Implementation lives in llm.sdk.providers.voyage.rerank.
Compatibility shim. Implementation lives in llm.sdk.providers.voyage.rerank.
Voyage /rerank transport. Differs from Cohere/Jina on field names only: request : top_k (not top_n) response: data (not results) Document representation is also slightly different — Voyage returns :document as a plain string when :return_documents=true.
Voyage usage shape: {:usage {:total_tokens N}}.
Voyage /rerank transport. Differs from Cohere/Jina on field names
only:
request : top_k (not top_n)
response: data (not results)
Document representation is also slightly different — Voyage returns
:document as a plain string when :return_documents=true.
Voyage usage shape: {:usage {:total_tokens N}}.Rate-limit header parsing and tracking.
Rate-limit header parsing and tracking.
Unified merged model + pricing registry.
Layered precedence (highest first):
Lookups field-merge across all tiers: higher tiers fill in missing fields (like context-length and pricing) from lower tiers. The :model/source of the returned entry is the highest-precedence tier that contributed.
All operations are by [provider-keyword, model-id].
Unified merged model + pricing registry.
Layered precedence (highest first):
1. Caller overrides — register-entry! lets the SDK consumer inject
custom data for endpoints the public registries don't know.
2. Live per-provider /models fetch — populated lazily by refresh!.
Authoritative for what the provider currently advertises.
3. LiteLLM snapshot — bundled at resources/litellm-snapshot.json
from llm.sdk.litellm-snapshot. Refreshable via
scripts/build_litellm_snapshot.py. Wide coverage of pricing +
capabilities, especially strong on Bedrock variants and
less-mainstream providers.
4. models.dev — breadth source via llm.sdk.models-dev. Includes
the bundled offline snapshot as its own innermost fallback.
Lookups field-merge across all tiers: higher tiers fill in missing
fields (like context-length and pricing) from lower tiers. The
:model/source of the returned entry is the highest-precedence tier
that contributed.
All operations are by [provider-keyword, model-id].Request preprocessing applied by llm.sdk/complete before the provider transport sees the request. Currently: drop+warn for canonical request fields the provider doesn't support.
The supported-params set lives on the profile as :profile/supported-params. When set, any canonical droppable field present in the request but absent from the set is removed before the transport builds the body, and one warning is emitted per call.
When :profile/supported-params is NOT set on a profile, requests pass through unchanged — providers opt in to the drop+warn behaviour by populating the set.
Request preprocessing applied by llm.sdk/complete before the provider transport sees the request. Currently: drop+warn for canonical request fields the provider doesn't support. The supported-params set lives on the profile as :profile/supported-params. When set, any canonical droppable field present in the request but absent from the set is removed before the transport builds the body, and one warning is emitted per call. When :profile/supported-params is NOT set on a profile, requests pass through unchanged — providers opt in to the drop+warn behaviour by populating the set.
Driver for rerank requests — the rerank counterpart to sdk/complete, sdk/embed, and sdk/moderate.
Resolves the profile, picks up its :profile/rerank-transport-constructor, builds and sends the request, returns a canonical RerankResponse. Providers without rerank support throw a clear ex-info.
Driver for rerank requests — the rerank counterpart to sdk/complete, sdk/embed, and sdk/moderate. Resolves the profile, picks up its :profile/rerank-transport-constructor, builds and sends the request, returns a canonical RerankResponse. Providers without rerank support throw a clear ex-info.
Data-driven retry policy with jittered backoff.
Data-driven retry policy with jittered backoff.
Canonical request/response schemas for the LLM SDK. All provider adapters translate to/from these shapes.
Canonical request/response schemas for the LLM SDK. All provider adapters translate to/from these shapes.
Driver for text-to-speech (TTS). The TTS counterpart to sdk/complete and sdk/transcribe.
Returns a SpeakResponse: {:audio/bytes byte-array :audio/content-type str :audio/model str? :response/usage Usage? :response/raw raw}.
Providers without a speak transport throw ex-info on call.
Driver for text-to-speech (TTS). The TTS counterpart to
sdk/complete and sdk/transcribe.
Returns a SpeakResponse: {:audio/bytes byte-array
:audio/content-type str
:audio/model str?
:response/usage Usage?
:response/raw raw}.
Providers without a speak transport throw ex-info on call.Small SSE helpers shared by provider adapters.
This namespace intentionally handles only the common line envelope:
data: ..., [DONE], and JSON parsing. Provider-specific event
semantics stay in each adapter.
Small SSE helpers shared by provider adapters. This namespace intentionally handles only the common line envelope: `data: ...`, `[DONE]`, and JSON parsing. Provider-specific event semantics stay in each adapter.
Streaming event taxonomy and reducer. Stream events → final canonical response. Preserves event order in output parts.
Streaming event taxonomy and reducer. Stream events → final canonical response. Preserves event order in output parts.
Driver for audio transcription (speech-to-text). The STT counterpart to sdk/complete and sdk/embed.
Providers without a transcribe transport throw ex-info on call so the missing capability surfaces at the call site.
Driver for audio transcription (speech-to-text). The STT counterpart to sdk/complete and sdk/embed. Providers without a transcribe transport throw ex-info on call so the missing capability surfaces at the call site.
Transport protocol definition. A transport owns the translation between canonical SDK shapes and provider-native wire formats.
Transport protocol definition. A transport owns the translation between canonical SDK shapes and provider-native wire formats.
Sibling protocol to llm.sdk.transport/Transport, scoped to embedding endpoints. The first non-chat modality.
We keep this protocol narrow on purpose. Embeddings don't stream, don't take tool calls, and don't carry reasoning — bolting them onto the chat Transport protocol would dilute both. New modalities (image, audio) get their own narrow protocols too.
Sibling protocol to llm.sdk.transport/Transport, scoped to embedding endpoints. The first non-chat modality. We keep this protocol narrow on purpose. Embeddings don't stream, don't take tool calls, and don't carry reasoning — bolting them onto the chat Transport protocol would dilute both. New modalities (image, audio) get their own narrow protocols too.
Sibling protocol to llm.sdk.transport/Transport, scoped to image generation endpoints.
Image generation is per-request: no streaming, no tools, no reasoning. The protocol is narrow on purpose, matching the embed/moderate/rerank pattern.
Sibling protocol to llm.sdk.transport/Transport, scoped to image generation endpoints. Image generation is per-request: no streaming, no tools, no reasoning. The protocol is narrow on purpose, matching the embed/moderate/rerank pattern.
Sibling protocol to llm.sdk.transport/Transport, scoped to moderation endpoints.
Moderation doesn't stream, doesn't take tools, and returns boolean flags + per-category scores. We keep the protocol narrow on purpose, matching the embed-transport pattern.
Sibling protocol to llm.sdk.transport/Transport, scoped to moderation endpoints. Moderation doesn't stream, doesn't take tools, and returns boolean flags + per-category scores. We keep the protocol narrow on purpose, matching the embed-transport pattern.
Sibling protocol to llm.sdk.transport/Transport, scoped to rerank endpoints. Rerank is a natural pair-step to embeddings — search apps need both, and the three providers we ship adapters for (Cohere, Voyage, Jina) all share a similar wire shape with minor field-naming differences.
Sibling protocol to llm.sdk.transport/Transport, scoped to rerank endpoints. Rerank is a natural pair-step to embeddings — search apps need both, and the three providers we ship adapters for (Cohere, Voyage, Jina) all share a similar wire shape with minor field-naming differences.
Sibling protocol to llm.sdk.transport/Transport, scoped to text-to- speech. Seventh modality alongside chat / embed / moderate / rerank / image / transcribe.
TTS responses are raw audio bytes (mp3/wav/opus/aac/flac/pcm) rather than JSON, so the driver reads :body as a byte array, not parsed JSON. The transport provides the content-type → :audio/content-type mapping.
Sibling protocol to llm.sdk.transport/Transport, scoped to text-to- speech. Seventh modality alongside chat / embed / moderate / rerank / image / transcribe. TTS responses are raw audio bytes (mp3/wav/opus/aac/flac/pcm) rather than JSON, so the driver reads :body as a byte array, not parsed JSON. The transport provides the content-type → :audio/content-type mapping.
Sibling protocol to llm.sdk.transport/Transport, scoped to audio transcription (speech-to-text). Sixth modality sibling alongside chat / embed / moderate / rerank / image.
Transcription has a different wire shape from the rest: requests are multipart/form-data (binary audio + form fields), responses carry text + optional segments / words / language detection.
Sibling protocol to llm.sdk.transport/Transport, scoped to audio transcription (speech-to-text). Sixth modality sibling alongside chat / embed / moderate / rerank / image. Transcription has a different wire shape from the rest: requests are multipart/form-data (binary audio + form fields), responses carry text + optional segments / words / language detection.
Usage normalization across providers.
Honesty rule: cache / reasoning / citation / search counters are present in the normalized map ONLY when the provider reported them. Absent != 0. Callers (and the response-stamping layer) use absence to distinguish 'provider was silent' from 'provider explicitly said 0', which matters for :cache/status surfaced on the canonical response.
Usage normalization across providers. Honesty rule: cache / reasoning / citation / search counters are present in the normalized map ONLY when the provider reported them. Absent != 0. Callers (and the response-stamping layer) use absence to distinguish 'provider was silent' from 'provider explicitly said 0', which matters for :cache/status surfaced on the canonical response.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |