llm — net.clojars.deadmeme5441/clojure-llm-sdk 0.1.0

llm.sdk.aws-eventstream

Decoder for the AWS vnd.amazon.eventstream binary frame format used by Bedrock /converse-stream and Kinesis. Spec: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html

Each frame: [prelude (12 bytes)] [headers] [payload] [message-crc (4 bytes)] Prelude: [total-length (4 BE)] [headers-length (4 BE)] [prelude-crc (4 BE)] Each header: [name-len (1)] [name] [type (1)] [value-len (2 BE if variable)] [value]

We skip CRC validation (caller already trusts the AWS connection).

Decoder for the AWS vnd.amazon.eventstream binary frame format used by
Bedrock /converse-stream and Kinesis. Spec:
  https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html

Each frame:
  [prelude (12 bytes)] [headers] [payload] [message-crc (4 bytes)]
Prelude:
  [total-length (4 BE)] [headers-length (4 BE)] [prelude-crc (4 BE)]
Each header:
  [name-len (1)] [name] [type (1)] [value-len (2 BE if variable)] [value]

We skip CRC validation (caller already trusts the AWS connection).

raw docstring

llm.sdk.aws-sigv4

AWS Signature V4 signing for Bedrock + any other AWS service. Implements the canonical-request → string-to-sign → derived-key → signature flow defined in: https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html

Public entry point is sign-request — give it an unsigned request map {:method :url :headers :body} plus credentials + region + service and it returns the same map with Authorization + x-amz-date + x-amz-content-sha256 + x-amz-security-token (when present) injected.

No external AWS SDK dep — uses JDK crypto only.

AWS Signature V4 signing for Bedrock + any other AWS service.
Implements the canonical-request → string-to-sign → derived-key →
signature flow defined in:
  https://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html

Public entry point is sign-request — give it an unsigned request map
{:method :url :headers :body} plus credentials + region + service and
it returns the same map with Authorization + x-amz-date +
x-amz-content-sha256 + x-amz-security-token (when present) injected.

No external AWS SDK dep — uses JDK crypto only.

raw docstring

llm.sdk.cache

Compatibility aggregate for context caching helpers.

Implementation ownership lives in llm.sdk.cache.markers, llm.sdk.cache.policy, and llm.sdk.cache.request.

Compatibility aggregate for context caching helpers.

Implementation ownership lives in llm.sdk.cache.markers,
llm.sdk.cache.policy, and llm.sdk.cache.request.

raw docstring

llm.sdk.cache.markers

Provider-native context cache marker transforms.

Provider-native context cache marker transforms.

raw docstring

llm.sdk.cache.policy

Provider/model cache strategy policy.

Provider/model cache strategy policy.

raw docstring

decide-strategy

llm.sdk.cache.request

Request cache option readers.

Request cache option readers.

raw docstring

llm.sdk.catalog

Model catalog — registry-backed lookups for model metadata.

Every fn here delegates to llm.sdk.registry; the hardcoded catalog atom that previously lived here is gone (all former entries are present in the bundled models.dev snapshot at resources/models-dev-snapshot.json).

Single-arg lookups (get-model, context-length, model-capable?) scan across providers and return the first match — stable for globally unique model ids (gpt-4o, claude-opus-4-7), non-deterministic for ambiguous ids that exist under multiple providers (e.g. a model served by both :openrouter and :openai). Prefer the provider-aware overloads when the id is ambiguous.

Model catalog — registry-backed lookups for model metadata.

Every fn here delegates to llm.sdk.registry; the hardcoded catalog
atom that previously lived here is gone (all former entries are
present in the bundled models.dev snapshot at
resources/models-dev-snapshot.json).

Single-arg lookups (get-model, context-length, model-capable?) scan
across providers and return the first match — stable for globally
unique model ids (gpt-4o, claude-opus-4-7), non-deterministic for
ambiguous ids that exist under multiple providers (e.g. a model
served by both :openrouter and :openai). Prefer the
provider-aware overloads when the id is ambiguous.

raw docstring

llm.sdk.embed

Driver for embedding requests — the embed counterpart to llm.sdk/complete. Resolves the provider profile, picks up its :profile/embed-transport-constructor, builds the request, sends it, and returns a canonical EmbedResponse.

Providers without an embed transport throw ex-info on call rather than returning nil — surfacing missing capability at the call site is friendlier than letting a downstream NullPointer explode.

Driver for embedding requests — the embed counterpart to
llm.sdk/complete. Resolves the provider profile, picks up its
:profile/embed-transport-constructor, builds the request, sends it,
and returns a canonical EmbedResponse.

Providers without an embed transport throw ex-info on call rather
than returning nil — surfacing missing capability at the call site
is friendlier than letting a downstream NullPointer explode.

raw docstring

embed

llm.sdk.errors

Structured error classification. Ported from Hermes error_classifier.py with simplified pipeline.

Structured error classification.
Ported from Hermes error_classifier.py with simplified pipeline.

raw docstring

llm.sdk.fallbacks

Sequential fallback across (provider, model) pairs.

Try each provider in order with the canonical request. Returns the first successful Response. If every attempt fails, throws ex-info carrying an :attempts vector of {:provider :model :error/...} maps in attempt order — callers can inspect to decide whether to re-raise, surface to UI, etc.

What this is NOT:

Credential pools / multi-key load balancing
Cooldown caches / weighted shuffle
TPM/RPM enforcement / budget routing
Latency-aware routing / complexity routing

All of those are explicitly out of scope for the SDK. They are credential-pool plumbing, the exact kind of work LiteLLM's router.py exists for and that we delegate back to the calling application.

Reference: litellm-ref/router_utils/fallback_event_handlers.py:85 run_async_fallback (shape only — do NOT port the pool plumbing).

Sequential fallback across (provider, model) pairs.

 Try each provider in order with the canonical request. Returns the
 first successful Response. If every attempt fails, throws ex-info
 carrying an :attempts vector of {:provider :model :error/...} maps
 in attempt order — callers can inspect to decide whether to
 re-raise, surface to UI, etc.

 What this is NOT:
- Credential pools / multi-key load balancing
- Cooldown caches / weighted shuffle
- TPM/RPM enforcement / budget routing
- Latency-aware routing / complexity routing

 All of those are explicitly out of scope for the SDK. They are
 credential-pool plumbing, the exact kind of work LiteLLM's router.py
 exists for and that we delegate back to the calling application.

 Reference: litellm-ref/router_utils/fallback_event_handlers.py:85
 run_async_fallback (shape only — do NOT port the pool plumbing).

raw docstring

with-fallbacks

llm.sdk.gcp-auth

GCP Application Default Credentials (ADC) resolution for Vertex AI.

Mirrors the order documented at https://cloud.google.com/docs/authentication/application-default-credentials and implemented by the official google-auth client libraries:

GOOGLE_APPLICATION_CREDENTIALS env var → credentials file
Well-known file at ~/.config/gcloud/application_default_credentials.json (set by gcloud auth application-default login)
GCE / Cloud Run / GKE metadata server (when running on GCP)

Credentials files come in two flavours we support: :service_account — has :private_key + :client_email; we RS256- sign a JWT and exchange it at oauth2.googleapis.com/token for an access token (jwt-bearer grant). :authorized_user — has :client_id, :client_secret, :refresh_token; we POST a refresh_token grant to the same endpoint. This is the format gcloud auth application-default login writes.

External account (workload identity federation) is not yet supported — those credentials require an STS exchange that varies by source (AWS, Azure, OIDC). Throw a clear error if encountered.

Two convenience layers sit above the proper ADC chain:

request opts :vertex :access-token (caller override)
GOOGLE_OAUTH_ACCESS_TOKEN env (pre-resolved bearer)

These are documented escape hatches; they do not replace ADC.

When none of the layers yield a token, raises ex-info {:error/type :auth/missing-credentials :attempted [...]} naming every source the SDK tried, in order.

GCP Application Default Credentials (ADC) resolution for Vertex AI.

Mirrors the order documented at
https://cloud.google.com/docs/authentication/application-default-credentials
and implemented by the official google-auth client libraries:

  1. GOOGLE_APPLICATION_CREDENTIALS env var → credentials file
  2. Well-known file at
     ~/.config/gcloud/application_default_credentials.json
     (set by `gcloud auth application-default login`)
  3. GCE / Cloud Run / GKE metadata server (when running on GCP)

Credentials files come in two flavours we support:
  :service_account  — has :private_key + :client_email; we RS256-
                      sign a JWT and exchange it at
                      oauth2.googleapis.com/token for an access
                      token (jwt-bearer grant).
  :authorized_user  — has :client_id, :client_secret, :refresh_token;
                      we POST a refresh_token grant to the same
                      endpoint. This is the format
                      `gcloud auth application-default login` writes.

External account (workload identity federation) is not yet supported
— those credentials require an STS exchange that varies by source
(AWS, Azure, OIDC). Throw a clear error if encountered.

Two convenience layers sit *above* the proper ADC chain:
  - request opts :vertex :access-token (caller override)
  - GOOGLE_OAUTH_ACCESS_TOKEN env (pre-resolved bearer)

These are documented escape hatches; they do not replace ADC.

When none of the layers yield a token, raises ex-info
{:error/type :auth/missing-credentials :attempted [...]}
naming every source the SDK tried, in order.

raw docstring

llm.sdk.http

Thin, mockable HTTP layer built on hato.

Thin, mockable HTTP layer built on hato.

raw docstring

llm.sdk.image

Driver for image generation requests — the image counterpart to sdk/complete, sdk/embed, sdk/moderate, and sdk/rerank.

Resolves the profile, picks up its :profile/image-transport-constructor, builds and sends the request, returns a canonical ImageGenResponse. Providers without image support throw a clear ex-info.

Driver for image generation requests — the image counterpart to
sdk/complete, sdk/embed, sdk/moderate, and sdk/rerank.

Resolves the profile, picks up its
:profile/image-transport-constructor, builds and sends the request,
returns a canonical ImageGenResponse. Providers without image
support throw a clear ex-info.

raw docstring

generate-image

llm.sdk.litellm-snapshot

Read-only loader for the LiteLLM-derived pricing/capability snapshot bundled at resources/litellm-snapshot.json.

LiteLLM maintains an actively-curated catalog of ~2.7k model entries keyed by their provider's model id. The bundled snapshot is a filtered subset — only providers we have SDK adapters for, with each entry stripped to the fields llm.sdk.registry uses (context-length, max-output-tokens, capability flags, per-million pricing). To refresh, re-run scripts/build_litellm_snapshot.py.

This tier is a sibling to llm.sdk.models-dev — both contribute to llm.sdk.registry's field-merge. Where they overlap, the merge layer (registry/merge-pair) takes a key-level union and rightmost- wins per field; the registry orders the tiers via its lookup fn.

Read-only loader for the LiteLLM-derived pricing/capability snapshot
bundled at resources/litellm-snapshot.json.

LiteLLM maintains an actively-curated catalog of ~2.7k model entries
keyed by their provider's model id. The bundled snapshot is a
filtered subset — only providers we have SDK adapters for, with
each entry stripped to the fields llm.sdk.registry uses
(context-length, max-output-tokens, capability flags, per-million
pricing). To refresh, re-run scripts/build_litellm_snapshot.py.

This tier is a sibling to llm.sdk.models-dev — both contribute
to llm.sdk.registry's field-merge. Where they overlap, the merge
layer (registry/merge-pair) takes a key-level union and rightmost-
wins per field; the registry orders the tiers via its lookup fn.

raw docstring

llm.sdk.models

Per-provider /models endpoint fetchers.

Each supported provider has a fetcher that hits its public /models endpoint and returns a vector of normalized ModelEntry maps with :model/source :live-models-api.

The registry layer (llm.sdk.registry) merges these entries with the models.dev breadth registry and a bundled offline snapshot to produce one unified view per (provider, model).

Providers without a public /models endpoint (Codex, Codex-backend, Bedrock, Fake) throw :error :unsupported on fetch - callers should route those through models.dev / snapshot layers only.

Per-provider /models endpoint fetchers.

Each supported provider has a fetcher that hits its public /models
endpoint and returns a vector of normalized ModelEntry maps with
:model/source :live-models-api.

The registry layer (llm.sdk.registry) merges these entries with the
models.dev breadth registry and a bundled offline snapshot to produce
one unified view per (provider, model).

Providers without a public /models endpoint (Codex, Codex-backend,
Bedrock, Fake) throw :error :unsupported on fetch - callers should
route those through models.dev / snapshot layers only.

raw docstring

llm.sdk.models-dev

models.dev breadth registry loader.

Three-tier cache hierarchy mirroring hermes-agent/agent/models_dev.py:

In-memory atom (1h TTL by default)
Disk cache at ~/.clojure-llm-sdk/models-dev-cache.json (1h by mtime)
Network fetch from https://models.dev/api.json
Stale disk cache fallback (network failed, disk exists but old)
Bundled snapshot at resources/models-dev-snapshot.json (last resort, ships with the SDK so offline use works)

Normalizes models.dev's per-provider tree into ModelEntry maps with :model/source :models-dev so the registry merge layer can compare against live /models fetches and the bundled snapshot uniformly.

models.dev breadth registry loader.

Three-tier cache hierarchy mirroring hermes-agent/agent/models_dev.py:
  1. In-memory atom (1h TTL by default)
  2. Disk cache at ~/.clojure-llm-sdk/models-dev-cache.json (1h by mtime)
  3. Network fetch from https://models.dev/api.json
  4. Stale disk cache fallback (network failed, disk exists but old)
  5. Bundled snapshot at resources/models-dev-snapshot.json (last resort,
     ships with the SDK so offline use works)

Normalizes models.dev's per-provider tree into ModelEntry maps with
:model/source :models-dev so the registry merge layer can compare
against live /models fetches and the bundled snapshot uniformly.

raw docstring

llm.sdk.moderate

Driver for moderation requests — the moderate counterpart to llm.sdk/complete and llm.sdk/embed.

Resolves the provider profile, picks up its :profile/moderation-transport-constructor, builds the request, and returns a canonical ModerationResponse.

Providers without a moderation transport throw ex-info rather than NullPointer.

Driver for moderation requests — the moderate counterpart to
llm.sdk/complete and llm.sdk/embed.

Resolves the provider profile, picks up its
:profile/moderation-transport-constructor, builds the request, and
returns a canonical ModerationResponse.

Providers without a moderation transport throw ex-info rather
than NullPointer.

raw docstring

moderate

llm.sdk.pricing

Pricing lookup + cost estimation, layered on llm.sdk.registry.

Data flow: (sdk/estimate-cost ...) ─► estimate-cost-for-model │ ▼ registry/lookup ─► merged ModelEntry - override tier (pricing from cost map) - live /models tier - models.dev tier (including bundled snapshot)

The hardcoded pricing snapshot that previously lived here folded into the bundled models.dev snapshot at resources/models-dev-snapshot.json (every former entry verified present with current pricing). Callers who need to inject custom pricing should use llm.sdk.registry/register-entry!, which bypasses the public registries.

The PricingEntry record shape is preserved for callers who already consume it — internally we convert ModelEntry's :model/cost map back to this shape at lookup time.

Pricing lookup + cost estimation, layered on llm.sdk.registry.

Data flow:
  (sdk/estimate-cost ...) ─► estimate-cost-for-model
                                    │
                                    ▼
                              registry/lookup        ─► merged ModelEntry
                               - override tier       (pricing from cost map)
                               - live /models tier
                               - models.dev tier (including bundled snapshot)

The hardcoded pricing snapshot that previously lived here folded into
the bundled models.dev snapshot at resources/models-dev-snapshot.json
(every former entry verified present with current pricing). Callers
who need to inject custom pricing should use
llm.sdk.registry/register-entry!, which bypasses the public registries.

The PricingEntry record shape is preserved for callers who already
consume it — internally we convert ModelEntry's :model/cost map back
to this shape at lookup time.

raw docstring

llm.sdk.provider

Compatibility aggregate for provider registry, auth, and built-ins.

Implementation ownership lives under llm.sdk.provider.registry, llm.sdk.provider.auth, and llm.sdk.provider.builtins.

Compatibility aggregate for provider registry, auth, and built-ins.

Implementation ownership lives under llm.sdk.provider.registry,
llm.sdk.provider.auth, and llm.sdk.provider.builtins.

raw docstring

llm.sdk.provider-coverage

Declared per-provider SDK coverage.

This is not marketing metadata. It is an internal contract matrix used by tests to keep provider support honest across request/response shape, SDK surface area, caching, usage metrics, pricing, model listing, auth, errors, and live-smoke coverage.

Declared per-provider SDK coverage.

This is not marketing metadata. It is an internal contract matrix used by
tests to keep provider support honest across request/response shape, SDK
surface area, caching, usage metrics, pricing, model listing, auth, errors,
and live-smoke coverage.

raw docstring

llm.sdk.provider.auth

Provider auth and runtime profile configuration.

Provider auth and runtime profile configuration.

raw docstring

llm.sdk.provider.builtins

Built-in provider profile definitions.

Built-in provider profile definitions.

raw docstring

llm.sdk.provider.registry

Provider profile registry ownership.

Provider profile registry ownership.

raw docstring

llm.sdk.providers.anthropic

Compatibility shim. Implementation lives in llm.sdk.providers.anthropic.chat.

Compatibility shim. Implementation lives in llm.sdk.providers.anthropic.chat.

raw docstring

llm.sdk.providers.anthropic.chat

Anthropic Messages API transport adapter. Supports thinking blocks, cache_control, tool use, streaming deltas. Preserves provider-specific replay state (reasoning_details, signatures).

Anthropic Messages API transport adapter.
Supports thinking blocks, cache_control, tool use, streaming deltas.
Preserves provider-specific replay state (reasoning_details, signatures).

raw docstring

llm.sdk.providers.bedrock

Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.converse.

Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.converse.

raw docstring

llm.sdk.providers.bedrock-image

Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.image.

Compatibility shim. Implementation lives in llm.sdk.providers.bedrock.image.

raw docstring

llm.sdk.providers.bedrock.converse

AWS Bedrock Converse API transport adapter.

Auth: AWS Signature V4 — sdk/complete dispatches on :profile/auth-strategy :aws-sigv4 and signs the request via llm.sdk.aws-sigv4 just before the HTTP send.

Streaming: Bedrock's /converse-stream emits binary event-stream frames (vnd.amazon.eventstream). sdk/complete reads the raw InputStream via llm.sdk.aws-eventstream/frame-seq and hands each parsed frame to parse-stream-event-bedrock as a map.

Model-id mapping: canonical short ids (e.g. claude-sonnet-4-5, nova-pro) are mapped to Bedrock's region-versioned id format (e.g. anthropic.claude-sonnet-4-5-20250101-v1:0); unknown ids pass through verbatim so callers can provide explicit ARNs.

AWS Bedrock Converse API transport adapter.

Auth: AWS Signature V4 — sdk/complete dispatches on
:profile/auth-strategy :aws-sigv4 and signs the request via
llm.sdk.aws-sigv4 just before the HTTP send.

Streaming: Bedrock's /converse-stream emits binary event-stream
frames (vnd.amazon.eventstream). sdk/complete reads the raw
InputStream via llm.sdk.aws-eventstream/frame-seq and hands
each parsed frame to parse-stream-event-bedrock as a map.

Model-id mapping: canonical short ids (e.g. claude-sonnet-4-5,
nova-pro) are mapped to Bedrock's region-versioned id format
(e.g. anthropic.claude-sonnet-4-5-20250101-v1:0); unknown ids
pass through verbatim so callers can provide explicit ARNs.

raw docstring

llm.sdk.providers.bedrock.image

Bedrock image-generation adapter (Titan Image Generator + Stability SD3 / SDXL). All use bedrock-runtime /model/{id}/invoke with SigV4. Each model has a different body shape:

amazon.titan-image-generator-v1 / -v2:0 {:taskType "TEXT_IMAGE" :textToImageParams {:text "..."} :imageGenerationConfig {:numberOfImages N :width W :height H :cfgScale 8 :seed 0}} response {:images ["b64", ...]}

stability.stable-diffusion-xl-v1 {:text_prompts [{:text "..." :weight 1.0}] :cfg_scale N :seed N :steps 30} response {:artifacts [{:base64 "..."}]}

We dispatch on a substring match against the model id and route to the matching builder/parser pair.

Bedrock image-generation adapter (Titan Image Generator + Stability
SD3 / SDXL). All use bedrock-runtime /model/{id}/invoke with SigV4.
Each model has a different body shape:

  amazon.titan-image-generator-v1 / -v2:0
    {:taskType "TEXT_IMAGE"
     :textToImageParams {:text "..."}
     :imageGenerationConfig {:numberOfImages N :width W :height H :cfgScale 8 :seed 0}}
    response {:images ["b64", ...]}

  stability.stable-diffusion-xl-v1
    {:text_prompts [{:text "..." :weight 1.0}]
     :cfg_scale N :seed N :steps 30}
    response {:artifacts [{:base64 "..."}]}

We dispatch on a substring match against the model id and route to
the matching builder/parser pair.

raw docstring

llm.sdk.providers.bedrock.rerank

Bedrock Agent Runtime /rerank adapter.

Canonical SDK rerank requests follow Cohere-style inputs: {model, query, documents, top-n}. Bedrock expects an Agent Runtime request containing queries, sources, and a Bedrock reranking configuration. The request is signed by llm.sdk.rerank via SigV4.

Bedrock Agent Runtime /rerank adapter.

Canonical SDK rerank requests follow Cohere-style inputs:
{model, query, documents, top-n}. Bedrock expects an Agent Runtime
request containing queries, sources, and a Bedrock reranking
configuration. The request is signed by llm.sdk.rerank via SigV4.

raw docstring

llm.sdk.providers.codex

Compatibility shim. Implementation lives in llm.sdk.providers.codex.responses.

Compatibility shim. Implementation lives in llm.sdk.providers.codex.responses.

raw docstring

llm.sdk.providers.codex.responses

OpenAI Responses API (Codex) transport adapter. Covers both the standard OpenAI Responses API (api.openai.com) and the Codex backend (chatgpt.com/backend-api/codex).

For the Codex backend, auth is read from ~/.codex/auth.json (shared with the official OpenAI Codex CLI).

OpenAI Responses API (Codex) transport adapter.
Covers both the standard OpenAI Responses API (api.openai.com)
and the Codex backend (chatgpt.com/backend-api/codex).

For the Codex backend, auth is read from ~/.codex/auth.json
(shared with the official OpenAI Codex CLI).

raw docstring

llm.sdk.providers.cohere-chat

Compatibility shim. Implementation lives in llm.sdk.providers.cohere.chat.

Compatibility shim. Implementation lives in llm.sdk.providers.cohere.chat.

raw docstring

llm.sdk.providers.cohere-embed

Compatibility shim. Implementation lives in llm.sdk.providers.cohere.embeddings.

Compatibility shim. Implementation lives in llm.sdk.providers.cohere.embeddings.

raw docstring

llm.sdk.providers.cohere-rerank

Compatibility shim. Implementation lives in llm.sdk.providers.cohere.rerank.

Compatibility shim. Implementation lives in llm.sdk.providers.cohere.rerank.

raw docstring

llm.sdk.providers.cohere.chat

Cohere /v2/chat native transport adapter.

Cohere is OpenAI-compat-ish but differs enough to need its own adapter: it has a typed message-content array, a documents field, a citation_options control, citations on the response, and a streaming event taxonomy with separate content-start / content-delta / content-end plus tool-plan-delta and citation-* events.

Reference: litellm-ref/llms/cohere/chat/v2_transformation.py.

Cohere /v2/chat native transport adapter.

 Cohere is OpenAI-compat-ish but differs enough to need its own
 adapter: it has a typed message-content array, a documents field,
 a citation_options control, citations on the response, and a
 streaming event taxonomy with separate content-start /
 content-delta / content-end plus tool-plan-delta and citation-*
 events.

Reference: litellm-ref/llms/cohere/chat/v2_transformation.py.

raw docstring

llm.sdk.providers.cohere.embeddings

Cohere embed adapter — POST {base}/embed.

Cohere's wire shape diverges from OpenAI's in three places:

Request uses :texts (vector) instead of :input.
Request carries a required :input_type (search_document / search_query / classification / clustering) which lives in canonical request as :embed/provider-options :input-type. Defaults to "search_document" when omitted — that's the safest fallback for general-purpose retrieval.
Response embeddings live under :embeddings.float (newer API with multi-format support) or :embeddings (legacy single format). Usage is in :meta.billed_units.input_tokens.

Live smoke is env-gated under COHERE_API_KEY.

Cohere embed adapter — POST {base}/embed.

 Cohere's wire shape diverges from OpenAI's in three places:
- Request uses :texts (vector) instead of :input.
- Request carries a required :input_type
     (search_document / search_query / classification / clustering)
     which lives in canonical request as
     :embed/provider-options :input-type. Defaults to
     "search_document" when omitted — that's the safest fallback
     for general-purpose retrieval.
- Response embeddings live under :embeddings.float (newer API
     with multi-format support) or :embeddings (legacy single
     format). Usage is in :meta.billed_units.input_tokens.

 Live smoke is env-gated under COHERE_API_KEY.

raw docstring

llm.sdk.providers.cohere.rerank

Cohere /rerank transport. The wire shape is also used by Jina — both accept {model, query, documents, top_n, return_documents} and return {results [{index, relevance_score, document {text}}]}.

Cohere additionally returns :meta.billed_units.search_units for usage; Jina returns :usage {total_tokens}. Both are surfaced through the canonical :response/usage where possible.

Cohere /rerank transport. The wire shape is also used by Jina —
both accept {model, query, documents, top_n, return_documents}
and return {results [{index, relevance_score, document {text}}]}.

Cohere additionally returns :meta.billed_units.search_units for
usage; Jina returns :usage {total_tokens}. Both are surfaced
through the canonical :response/usage where possible.

raw docstring

llm.sdk.providers.elevenlabs

Compatibility shim. Implementation lives in llm.sdk.providers.elevenlabs.tts.

Compatibility shim. Implementation lives in llm.sdk.providers.elevenlabs.tts.

raw docstring

llm.sdk.providers.elevenlabs.tts

ElevenLabs TTS adapter — POST /v1/text-to-speech/:voice_id with xi-api-key header. Voice id is part of the URL; model id and text live in the JSON body. Returns audio bytes (mp3 by default).

Reference: litellm-ref/llms/elevenlabs/ + ElevenLabs API docs.

ElevenLabs TTS adapter — POST /v1/text-to-speech/:voice_id with
xi-api-key header. Voice id is part of the URL; model id and
text live in the JSON body. Returns audio bytes (mp3 by default).

Reference: litellm-ref/llms/elevenlabs/ + ElevenLabs API docs.

raw docstring

llm.sdk.providers.fake

Compatibility shim. Implementation lives in llm.sdk.providers.fake.chat.

Compatibility shim. Implementation lives in llm.sdk.providers.fake.chat.

raw docstring

make-fake-transport

llm.sdk.providers.fake.chat

Fake/test provider that returns deterministic responses. Conforms to the Transport protocol.

Fake/test provider that returns deterministic responses.
Conforms to the Transport protocol.

raw docstring

make-fake-transport

llm.sdk.providers.gemini-native

Compatibility shim. Implementation lives in llm.sdk.providers.gemini.native.

Compatibility shim. Implementation lives in llm.sdk.providers.gemini.native.

raw docstring

llm.sdk.providers.gemini.imagen

Vertex AI Imagen 3 / 4 image-generation adapter.

Endpoint: POST {host}/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:predict Body: {:instances [{:prompt "..."}] :parameters {:sampleCount N :aspectRatio "1:1" :seed ...}} Response: {:predictions [{:bytesBase64Encoded "..." :mimeType "image/png"}]}

Auth: same GCP OAuth as vertex-gemini — token from :request provider-options.vertex.access-token or GOOGLE_OAUTH_ACCESS_TOKEN.

Models surfaced under :vertex-imagen include imagen-3.0-generate-002, imagen-3.0-fast-generate-001, imagen-4.0-generate-001.

Vertex AI Imagen 3 / 4 image-generation adapter.

Endpoint:
  POST {host}/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:predict
Body:
  {:instances [{:prompt "..."}]
   :parameters {:sampleCount N :aspectRatio "1:1" :seed ...}}
Response:
  {:predictions [{:bytesBase64Encoded "..." :mimeType "image/png"}]}

Auth: same GCP OAuth as vertex-gemini — token from
:request provider-options.vertex.access-token or
GOOGLE_OAUTH_ACCESS_TOKEN.

Models surfaced under :vertex-imagen include imagen-3.0-generate-002,
imagen-3.0-fast-generate-001, imagen-4.0-generate-001.

raw docstring

llm.sdk.providers.gemini.native

Gemini Native API transport adapter. Handles thought signatures, streaming deltas, safety metadata. Preserves provider-specific replay state.

Gemini Native API transport adapter.
Handles thought signatures, streaming deltas, safety metadata.
Preserves provider-specific replay state.

raw docstring

llm.sdk.providers.gemini.vertex

Vertex AI Gemini transport adapter.

Builds on Gemini native with different auth (GCP OAuth) and endpoint structure. Auth resolution follows the standard GCP ADC chain via llm.sdk.gcp-auth: request opts → GOOGLE_OAUTH_ACCESS_TOKEN env → gcloud auth print-access-token → GOOGLE_APPLICATION_CREDENTIALS service-account JSON (RS256-signed JWT exchanged at oauth2.googleapis.com/token).

Project resolution: request opts → profile quirks → GOOGLE_CLOUD_PROJECT env → SA JSON project_id.

Vertex AI Gemini transport adapter.

Builds on Gemini native with different auth (GCP OAuth) and endpoint
structure. Auth resolution follows the standard GCP ADC chain via
llm.sdk.gcp-auth: request opts → GOOGLE_OAUTH_ACCESS_TOKEN env →
`gcloud auth print-access-token` → GOOGLE_APPLICATION_CREDENTIALS
service-account JSON (RS256-signed JWT exchanged at
oauth2.googleapis.com/token).

Project resolution: request opts → profile quirks →
GOOGLE_CLOUD_PROJECT env → SA JSON project_id.

raw docstring

llm.sdk.providers.ollama-native

Compatibility shim. Implementation lives in llm.sdk.providers.ollama.native.

Compatibility shim. Implementation lives in llm.sdk.providers.ollama.native.

raw docstring

llm.sdk.providers.ollama.native

Native Ollama adapter — /api/chat (chat) and /api/embed (embeddings).

Ollama also exposes an OpenAI-compat /v1/chat/completions endpoint that the existing :ollama profile (registered) targets. This namespace registers a sibling :ollama-native profile for callers who want the native shape — older Ollama versions, vision input via the native :images field, or workflows that need the native :options keys (e.g. :num_ctx, :num_predict, :mirostat).

Streaming: Ollama uses NDJSON (one JSON object per line), NOT SSE. We re-use the http/sse-request line reader and parse each line as a raw JSON object instead of stripping a 'data: ' prefix.

Native Ollama adapter — /api/chat (chat) and /api/embed (embeddings).

Ollama also exposes an OpenAI-compat /v1/chat/completions endpoint
that the existing :ollama profile (registered) targets.
This namespace registers a sibling :ollama-native profile for callers
who want the native shape — older Ollama versions, vision input via
the native :images field, or workflows that need the native
:options keys (e.g. :num_ctx, :num_predict, :mirostat).

Streaming: Ollama uses NDJSON (one JSON object per line), NOT
SSE. We re-use the http/sse-request line reader and parse each line
as a raw JSON object instead of stripping a 'data: ' prefix.

raw docstring

llm.sdk.providers.openai-chat

Compatibility shim. Implementation lives in llm.sdk.providers.openai.chat.

Compatibility shim. Implementation lives in llm.sdk.providers.openai.chat.

raw docstring

llm.sdk.providers.openai-compat.aliases

Data-only OpenAI-compatible provider alias specs.

These providers share the OpenAI chat-completions wire shape. Adapter code may still apply provider quirks from the profile, but the registry should not need one hand-written register-provider call per alias.

Data-only OpenAI-compatible provider alias specs.

These providers share the OpenAI chat-completions wire shape. Adapter code
may still apply provider quirks from the profile, but the registry should
not need one hand-written register-provider call per alias.

raw docstring

llm.sdk.providers.openai-embed

Compatibility shim. Implementation lives in llm.sdk.providers.openai.embeddings.

Compatibility shim. Implementation lives in llm.sdk.providers.openai.embeddings.

raw docstring

llm.sdk.providers.openai-image

Compatibility shim. Implementation lives in llm.sdk.providers.openai.image.

Compatibility shim. Implementation lives in llm.sdk.providers.openai.image.

raw docstring

llm.sdk.providers.openai-moderation

Compatibility shim. Implementation lives in llm.sdk.providers.openai.moderation.

Compatibility shim. Implementation lives in llm.sdk.providers.openai.moderation.

raw docstring

llm.sdk.providers.openai-speak

Compatibility shim. Implementation lives in llm.sdk.providers.openai.speak.

Compatibility shim. Implementation lives in llm.sdk.providers.openai.speak.

raw docstring

llm.sdk.providers.openai-transcribe

Compatibility shim. Implementation lives in llm.sdk.providers.openai.transcribe.

Compatibility shim. Implementation lives in llm.sdk.providers.openai.transcribe.

raw docstring

llm.sdk.providers.openai.audio

OpenAI audio provider family namespace.

OpenAI audio provider family namespace.

raw docstring

llm.sdk.providers.openai.chat

OpenAI Chat Completions transport adapter. Covers OpenAI, OpenRouter, DeepSeek, and other OpenAI-compatible providers.

OpenAI Chat Completions transport adapter.
Covers OpenAI, OpenRouter, DeepSeek, and other OpenAI-compatible providers.

raw docstring

llm.sdk.providers.openai.embeddings

OpenAI embeddings adapter — POST {base}/embeddings.

Same auth and base-url plumbing as the chat adapter; we share the profile, just register an additional :profile/embed-transport- constructor on it. Other OpenAI-compat hosts that offer embeddings (Mistral, Together, Voyage, Jina, etc.) can reuse this transport by attaching the same constructor.

OpenAI embeddings adapter — POST {base}/embeddings.

Same auth and base-url plumbing as the chat adapter; we share the
profile, just register an additional :profile/embed-transport-
constructor on it. Other OpenAI-compat hosts that offer embeddings
(Mistral, Together, Voyage, Jina, etc.) can reuse this transport by
attaching the same constructor.

raw docstring

llm.sdk.providers.openai.image

OpenAI image generation adapter.

POST {base}/images/generations. Covers DALL-E 3, DALL-E 2, and the gpt-image-1 family. The wire body differs subtly across them (gpt-image-1 takes :quality :low|:medium|:high|:auto and returns b64_json only; DALL-E 3 takes :quality :standard|:hd and :style :vivid|:natural). The adapter passes canonical fields straight through — provider-specific values are the caller's responsibility, and the same provider-options :extra_body hatch as elsewhere covers anything we haven't surfaced.

OpenAI image generation adapter.

POST {base}/images/generations. Covers DALL-E 3, DALL-E 2, and
the gpt-image-1 family. The wire body differs subtly across them
(gpt-image-1 takes :quality :low|:medium|:high|:auto and returns
b64_json only; DALL-E 3 takes :quality :standard|:hd and :style
:vivid|:natural). The adapter passes canonical fields straight
through — provider-specific values are the caller's responsibility,
and the same provider-options :extra_body hatch as elsewhere
covers anything we haven't surfaced.

raw docstring

llm.sdk.providers.openai.moderation

OpenAI Moderations adapter.

POST {base}/moderations. omni-moderation-latest (the default since Nov 2024) accepts multi-modal input — a vector of {:type :text|:image_url} maps as well as plain strings. text-moderation-* models are text-only.

Response shape per the OpenAI Moderations API: {:id :model :results [{:flagged bool :categories {category-name bool} :category_scores {category-name float} :category_applied_input_types {category-name ["text"|"image"]}}]}

OpenAI Moderations adapter.

POST {base}/moderations. omni-moderation-latest (the default since
Nov 2024) accepts multi-modal input — a vector of {:type :text|:image_url}
maps as well as plain strings. text-moderation-* models are
text-only.

Response shape per the OpenAI Moderations API:
  {:id :model
   :results [{:flagged bool
              :categories {category-name bool}
              :category_scores {category-name float}
              :category_applied_input_types {category-name ["text"|"image"]}}]}

raw docstring

llm.sdk.providers.openai.speak

OpenAI /audio/speech adapter — POST {model, voice, input, response_format} returns raw audio bytes.

OpenAI /audio/speech adapter — POST {model, voice, input, response_format}
returns raw audio bytes.

raw docstring

llm.sdk.providers.openai.transcribe

OpenAI /audio/transcriptions adapter. Wire shape is shared by Groq's /openai/v1/audio/transcriptions endpoint (same field names, same verbose_json output), so the same transport class powers both profiles.

OpenAI /audio/transcriptions adapter. Wire shape is shared by Groq's
/openai/v1/audio/transcriptions endpoint (same field names, same
verbose_json output), so the same transport class powers both
profiles.

raw docstring

llm.sdk.providers.openrouter

Compatibility shim. Implementation lives in llm.sdk.providers.openrouter.chat.

Compatibility shim. Implementation lives in llm.sdk.providers.openrouter.chat.

raw docstring

llm.sdk.providers.openrouter.chat

OpenRouter transport adapter. Builds on OpenAI Chat Completions with OpenRouter-specific quirks:

provider preferences routing in extra_body
Pareto Code router plugin
reasoning config in extra_body (not top-level)
special model naming and error handling.

OpenRouter transport adapter.
Builds on OpenAI Chat Completions with OpenRouter-specific quirks:
- provider preferences routing in extra_body
- Pareto Code router plugin
- reasoning config in extra_body (not top-level)
- special model naming and error handling.

raw docstring

llm.sdk.providers.openrouter.image

OpenRouter image generation transport.

OpenRouter image models generate images through chat completions, not OpenAI's /images/generations endpoint. This adapter mirrors that wire shape and extracts images from choices[].message.images[].image_url.url.

OpenRouter image generation transport.

OpenRouter image models generate images through chat completions, not
OpenAI's /images/generations endpoint. This adapter mirrors that wire
shape and extracts images from choices[].message.images[].image_url.url.

raw docstring

llm.sdk.providers.perplexity

Compatibility shim. Implementation lives in llm.sdk.providers.perplexity.chat.

Compatibility shim. Implementation lives in llm.sdk.providers.perplexity.chat.

raw docstring

llm.sdk.providers.perplexity.chat

Perplexity transport — OpenAI-shape body + citation/search-results surfacing.

Request building is identical to openai-chat. Response parsing extends the OpenAI parser with two extractions:

:search_results [{:url :title :snippet}, ...] → richer CitationPart per result
:citations ["url", ...] → URL-only CitationPart when search_results isn't present

Usage normalization delegates to normalize-openai-usage, which already picks up Perplexity's :citation_tokens and :num_search_queries when present.

Streaming: the final SSE chunk on /chat/completions carries :citations alongside :usage and :finish_reason. parse-stream-event returns a vector of events in that case — sdk/complete flattens multi-event return values.

Perplexity transport — OpenAI-shape body + citation/search-results
 surfacing.

 Request building is identical to openai-chat. Response parsing
 extends the OpenAI parser with two extractions:

- :search_results [{:url :title :snippet}, ...] → richer
     CitationPart per result
- :citations ["url", ...]                       → URL-only
     CitationPart when search_results isn't present

 Usage normalization delegates to normalize-openai-usage, which
 already picks up Perplexity's :citation_tokens and
 :num_search_queries when present.

 Streaming: the final SSE chunk on /chat/completions carries
 :citations alongside :usage and :finish_reason. parse-stream-event
 returns a vector of events in that case — sdk/complete flattens
 multi-event return values.

raw docstring

llm.sdk.providers.vertex-gemini

Compatibility shim. Implementation lives in llm.sdk.providers.gemini.vertex.

Compatibility shim. Implementation lives in llm.sdk.providers.gemini.vertex.

raw docstring

llm.sdk.providers.vertex-imagen

Compatibility shim. Implementation lives in llm.sdk.providers.gemini.imagen.

Compatibility shim. Implementation lives in llm.sdk.providers.gemini.imagen.

raw docstring

llm.sdk.providers.voyage-rerank

Compatibility shim. Implementation lives in llm.sdk.providers.voyage.rerank.

Compatibility shim. Implementation lives in llm.sdk.providers.voyage.rerank.

raw docstring

llm.sdk.providers.voyage.rerank

Voyage /rerank transport. Differs from Cohere/Jina on field names only: request : top_k (not top_n) response: data (not results) Document representation is also slightly different — Voyage returns :document as a plain string when :return_documents=true.

Voyage usage shape: {:usage {:total_tokens N}}.

Voyage /rerank transport. Differs from Cohere/Jina on field names
only:
  request : top_k  (not top_n)
  response: data   (not results)
Document representation is also slightly different — Voyage returns
:document as a plain string when :return_documents=true.

Voyage usage shape: {:usage {:total_tokens N}}.

raw docstring

llm.sdk.rate-limit

Rate-limit header parsing and tracking.

Rate-limit header parsing and tracking.

raw docstring

llm.sdk.registry

Unified merged model + pricing registry.

Layered precedence (highest first):

Caller overrides — register-entry! lets the SDK consumer inject custom data for endpoints the public registries don't know.
Live per-provider /models fetch — populated lazily by refresh!. Authoritative for what the provider currently advertises.
LiteLLM snapshot — bundled at resources/litellm-snapshot.json from llm.sdk.litellm-snapshot. Refreshable via scripts/build_litellm_snapshot.py. Wide coverage of pricing + capabilities, especially strong on Bedrock variants and less-mainstream providers.
models.dev — breadth source via llm.sdk.models-dev. Includes the bundled offline snapshot as its own innermost fallback.

Lookups field-merge across all tiers: higher tiers fill in missing fields (like context-length and pricing) from lower tiers. The :model/source of the returned entry is the highest-precedence tier that contributed.

All operations are by [provider-keyword, model-id].

Unified merged model + pricing registry.

Layered precedence (highest first):
  1. Caller overrides — register-entry! lets the SDK consumer inject
     custom data for endpoints the public registries don't know.
  2. Live per-provider /models fetch — populated lazily by refresh!.
     Authoritative for what the provider currently advertises.
  3. LiteLLM snapshot — bundled at resources/litellm-snapshot.json
     from llm.sdk.litellm-snapshot. Refreshable via
     scripts/build_litellm_snapshot.py. Wide coverage of pricing +
     capabilities, especially strong on Bedrock variants and
     less-mainstream providers.
  4. models.dev — breadth source via llm.sdk.models-dev. Includes
     the bundled offline snapshot as its own innermost fallback.

Lookups field-merge across all tiers: higher tiers fill in missing
fields (like context-length and pricing) from lower tiers. The
:model/source of the returned entry is the highest-precedence tier
that contributed.

All operations are by [provider-keyword, model-id].

raw docstring

llm.sdk.request

Request preprocessing applied by llm.sdk/complete before the provider transport sees the request. Currently: drop+warn for canonical request fields the provider doesn't support.

The supported-params set lives on the profile as :profile/supported-params. When set, any canonical droppable field present in the request but absent from the set is removed before the transport builds the body, and one warning is emitted per call.

When :profile/supported-params is NOT set on a profile, requests pass through unchanged — providers opt in to the drop+warn behaviour by populating the set.

Request preprocessing applied by llm.sdk/complete before the provider
transport sees the request. Currently: drop+warn for canonical
request fields the provider doesn't support.

The supported-params set lives on the profile as
:profile/supported-params. When set, any canonical droppable field
present in the request but absent from the set is removed before
the transport builds the body, and one warning is emitted per call.

When :profile/supported-params is NOT set on a profile, requests
pass through unchanged — providers opt in to the drop+warn
behaviour by populating the set.

raw docstring

llm.sdk.rerank

Driver for rerank requests — the rerank counterpart to sdk/complete, sdk/embed, and sdk/moderate.

Resolves the profile, picks up its :profile/rerank-transport-constructor, builds and sends the request, returns a canonical RerankResponse. Providers without rerank support throw a clear ex-info.

Driver for rerank requests — the rerank counterpart to sdk/complete,
sdk/embed, and sdk/moderate.

Resolves the profile, picks up its
:profile/rerank-transport-constructor, builds and sends the request,
returns a canonical RerankResponse. Providers without rerank
support throw a clear ex-info.

raw docstring

rerank

llm.sdk.retry

Data-driven retry policy with jittered backoff.

Data-driven retry policy with jittered backoff.

raw docstring

llm.sdk.schema

Canonical request/response schemas for the LLM SDK. All provider adapters translate to/from these shapes.

Canonical request/response schemas for the LLM SDK.
All provider adapters translate to/from these shapes.

raw docstring

llm.sdk.speak

Driver for text-to-speech (TTS). The TTS counterpart to sdk/complete and sdk/transcribe.

Returns a SpeakResponse: {:audio/bytes byte-array :audio/content-type str :audio/model str? :response/usage Usage? :response/raw raw}.

Providers without a speak transport throw ex-info on call.

Driver for text-to-speech (TTS). The TTS counterpart to
sdk/complete and sdk/transcribe.

Returns a SpeakResponse: {:audio/bytes byte-array
                           :audio/content-type str
                           :audio/model str?
                           :response/usage Usage?
                           :response/raw raw}.

Providers without a speak transport throw ex-info on call.

raw docstring

speak

llm.sdk.sse

Small SSE helpers shared by provider adapters.

This namespace intentionally handles only the common line envelope: data: ..., [DONE], and JSON parsing. Provider-specific event semantics stay in each adapter.

Small SSE helpers shared by provider adapters.

This namespace intentionally handles only the common line envelope:
`data: ...`, `[DONE]`, and JSON parsing. Provider-specific event
semantics stay in each adapter.

raw docstring

llm.sdk.stream

Streaming event taxonomy and reducer. Stream events → final canonical response. Preserves event order in output parts.

Streaming event taxonomy and reducer.
Stream events → final canonical response.
Preserves event order in output parts.

raw docstring

llm.sdk.transcribe

Driver for audio transcription (speech-to-text). The STT counterpart to sdk/complete and sdk/embed.

Providers without a transcribe transport throw ex-info on call so the missing capability surfaces at the call site.

Driver for audio transcription (speech-to-text). The STT counterpart
to sdk/complete and sdk/embed.

Providers without a transcribe transport throw ex-info on call so
the missing capability surfaces at the call site.

raw docstring

transcribe

llm.sdk.transport

Transport protocol definition. A transport owns the translation between canonical SDK shapes and provider-native wire formats.

Transport protocol definition. A transport owns the translation between
canonical SDK shapes and provider-native wire formats.

raw docstring

llm.sdk.transport.embed

Sibling protocol to llm.sdk.transport/Transport, scoped to embedding endpoints. The first non-chat modality.

We keep this protocol narrow on purpose. Embeddings don't stream, don't take tool calls, and don't carry reasoning — bolting them onto the chat Transport protocol would dilute both. New modalities (image, audio) get their own narrow protocols too.

Sibling protocol to llm.sdk.transport/Transport, scoped to embedding
endpoints. The first non-chat modality.

We keep this protocol narrow on purpose. Embeddings don't stream,
don't take tool calls, and don't carry reasoning — bolting them onto
the chat Transport protocol would dilute both. New modalities
(image, audio) get their own narrow protocols too.

raw docstring

EmbedTransport

llm.sdk.transport.image

Sibling protocol to llm.sdk.transport/Transport, scoped to image generation endpoints.

Image generation is per-request: no streaming, no tools, no reasoning. The protocol is narrow on purpose, matching the embed/moderate/rerank pattern.

Sibling protocol to llm.sdk.transport/Transport, scoped to image
generation endpoints.

Image generation is per-request: no streaming, no tools, no
reasoning. The protocol is narrow on purpose, matching the
embed/moderate/rerank pattern.

raw docstring

ImageTransport

llm.sdk.transport.moderate

Sibling protocol to llm.sdk.transport/Transport, scoped to moderation endpoints.

Moderation doesn't stream, doesn't take tools, and returns boolean flags + per-category scores. We keep the protocol narrow on purpose, matching the embed-transport pattern.

Sibling protocol to llm.sdk.transport/Transport, scoped to
moderation endpoints.

Moderation doesn't stream, doesn't take tools, and returns boolean
flags + per-category scores. We keep the protocol narrow on
purpose, matching the embed-transport pattern.

raw docstring

ModerationTransport

llm.sdk.transport.rerank

Sibling protocol to llm.sdk.transport/Transport, scoped to rerank endpoints. Rerank is a natural pair-step to embeddings — search apps need both, and the three providers we ship adapters for (Cohere, Voyage, Jina) all share a similar wire shape with minor field-naming differences.

Sibling protocol to llm.sdk.transport/Transport, scoped to rerank
endpoints. Rerank is a natural pair-step to embeddings —
search apps need both, and the three providers we ship adapters
for (Cohere, Voyage, Jina) all share a similar wire shape with
minor field-naming differences.

raw docstring

RerankTransport

llm.sdk.transport.speak

Sibling protocol to llm.sdk.transport/Transport, scoped to text-to- speech. Seventh modality alongside chat / embed / moderate / rerank / image / transcribe.

TTS responses are raw audio bytes (mp3/wav/opus/aac/flac/pcm) rather than JSON, so the driver reads :body as a byte array, not parsed JSON. The transport provides the content-type → :audio/content-type mapping.

Sibling protocol to llm.sdk.transport/Transport, scoped to text-to-
speech. Seventh modality alongside chat / embed / moderate / rerank
/ image / transcribe.

TTS responses are raw audio bytes (mp3/wav/opus/aac/flac/pcm) rather
than JSON, so the driver reads :body as a byte array, not parsed
JSON. The transport provides the content-type → :audio/content-type
mapping.

raw docstring

SpeakTransport

llm.sdk.transport.transcribe

Sibling protocol to llm.sdk.transport/Transport, scoped to audio transcription (speech-to-text). Sixth modality sibling alongside chat / embed / moderate / rerank / image.

Transcription has a different wire shape from the rest: requests are multipart/form-data (binary audio + form fields), responses carry text + optional segments / words / language detection.

Sibling protocol to llm.sdk.transport/Transport, scoped to audio
transcription (speech-to-text). Sixth modality sibling alongside
chat / embed / moderate / rerank / image.

Transcription has a different wire shape from the rest: requests
are multipart/form-data (binary audio + form fields), responses
carry text + optional segments / words / language detection.

raw docstring

TranscribeTransport

llm.sdk.usage

Usage normalization across providers.

Honesty rule: cache / reasoning / citation / search counters are present in the normalized map ONLY when the provider reported them. Absent != 0. Callers (and the response-stamping layer) use absence to distinguish 'provider was silent' from 'provider explicitly said 0', which matters for :cache/status surfaced on the canonical response.

Usage normalization across providers.

Honesty rule: cache / reasoning / citation / search counters are
present in the normalized map ONLY when the provider reported them.
Absent != 0. Callers (and the response-stamping layer) use absence
to distinguish 'provider was silent' from 'provider explicitly said 0',
which matters for :cache/status surfaced on the canonical response.

raw docstring

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field