com.blockether.svar.internal.tokens

Liking cljdoc? Tell your friends :D

Clojure only.

check-context-limit
context-limit
count-and-estimate
count-messages
count-tokens
DEFAULT_CONTEXT_LIMITS
DEFAULT_MODEL_PRICING
DEFAULT_OUTPUT_RESERVE
DEFAULT_TRIM_RATIO
estimate-cost
format-cost
max-input-tokens
truncate-messages
truncate-text

Token counting utilities for LLM API interactions.

Based on JTokkit (https://github.com/knuddelsgmbh/jtokkit) - a Java implementation of OpenAI's TikToken tokenizer.

Provides:

count-tokens - Count tokens for a string using a specific model's encoding
count-messages - Count tokens for a chat completion message array
estimate-cost - Estimate cost in USD based on model pricing
count-and-estimate - Count tokens and estimate cost in one call
context-limit - Get max context window for a model
max-input-tokens - Get max input tokens (context minus output reserve)
truncate-text - Token-aware text truncation
truncate-messages - Smart message truncation with priority
check-context-limit - Pre-flight check before API calls
format-cost - Format USD cost for display
get-model-pricing - Look up per-model pricing info

Note: Token counts are approximate. Chat completion API payloads have ~25 token error margin due to internal OpenAI formatting that isn't publicly documented.

Token counting utilities for LLM API interactions.

Based on JTokkit (https://github.com/knuddelsgmbh/jtokkit) - a Java implementation
of OpenAI's TikToken tokenizer.

Provides:
- `count-tokens` - Count tokens for a string using a specific model's encoding
- `count-messages` - Count tokens for a chat completion message array
- `estimate-cost` - Estimate cost in USD based on model pricing
- `count-and-estimate` - Count tokens and estimate cost in one call
- `context-limit` - Get max context window for a model
- `max-input-tokens` - Get max input tokens (context minus output reserve)
- `truncate-text` - Token-aware text truncation
 - `truncate-messages` - Smart message truncation with priority
 - `check-context-limit` - Pre-flight check before API calls
 - `format-cost` - Format USD cost for display
- `get-model-pricing` - Look up per-model pricing info

Note: Token counts are approximate. Chat completion API payloads have ~25 token
error margin due to internal OpenAI formatting that isn't publicly documented.

raw docstring

check-context-limit^clj

(check-context-limit model messages)

(check-context-limit model
                     messages
                     {:keys [output-reserve throw? context-limits]
                      :or {output-reserve DEFAULT_OUTPUT_RESERVE throw? false}})

Checks if messages fit within model context limit.

Use this BEFORE making API calls to get clear error messages instead of cryptic API errors.

Params: model - String. Model name. messages - Vector. Chat messages. opts - Map, optional:

:output-reserve - Integer. Tokens reserved for output (default: 0). When 0, the full context window is available for input — the API handles output allocation naturally.
:throw? - Boolean. Throw exception if over limit (default: false).
:context-limits - Map. Per-model context window overrides.

Returns: Map with:

:ok? - Boolean. True if messages fit.
:input-tokens - Integer. Counted input tokens.
:max-input-tokens - Integer. Maximum allowed.
:context-limit - Integer. Model's total context.
:output-reserve - Integer. Effective output reserve used.
:overflow - Integer. How many tokens over limit (0 if ok).
:error - String or nil. Error message if not ok.

Example: (check-context-limit "gpt-4o" messages) ;; => {:ok? true :input-tokens 5000 :max-input-tokens 128000 ...}

(check-context-limit "gpt-4o" messages {:output-reserve 4096}) ;; => {:ok? true :input-tokens 5000 :max-input-tokens 123904 ...}

(check-context-limit "gpt-4" huge-messages {:throw? true}) ;; => throws ExceptionInfo with detailed error

Checks if messages fit within model context limit.

Use this BEFORE making API calls to get clear error messages
instead of cryptic API errors.

Params:
`model` - String. Model name.
`messages` - Vector. Chat messages.
 `opts` - Map, optional:
   - :output-reserve - Integer. Tokens reserved for output (default: 0).
       When 0, the full context window is available for input — the API
       handles output allocation naturally.
   - :throw? - Boolean. Throw exception if over limit (default: false).
   - :context-limits - Map. Per-model context window overrides.
 
 Returns:
 Map with:
   - :ok? - Boolean. True if messages fit.
   - :input-tokens - Integer. Counted input tokens.
   - :max-input-tokens - Integer. Maximum allowed.
   - :context-limit - Integer. Model's total context.
   - :output-reserve - Integer. Effective output reserve used.
   - :overflow - Integer. How many tokens over limit (0 if ok).
   - :error - String or nil. Error message if not ok.
 
 Example:
 (check-context-limit "gpt-4o" messages)
 ;; => {:ok? true :input-tokens 5000 :max-input-tokens 128000 ...}
 
 (check-context-limit "gpt-4o" messages {:output-reserve 4096})
 ;; => {:ok? true :input-tokens 5000 :max-input-tokens 123904 ...}
 
 (check-context-limit "gpt-4" huge-messages {:throw? true})
 ;; => throws ExceptionInfo with detailed error

raw docstring

context-limit^clj

(context-limit model)

(context-limit model context-limits)

Returns the maximum context window size for a model.

Params: model - String. Model name. context-limits - Map, optional. Override map (merged defaults from config).

Returns: Integer. Maximum context tokens.

Example: (context-limit "gpt-4o") ;; => 128000

Returns the maximum context window size for a model.

Params:
`model` - String. Model name.
`context-limits` - Map, optional. Override map (merged defaults from config).

Returns:
Integer. Maximum context tokens.

Example:
(context-limit "gpt-4o")
;; => 128000

raw docstring

count-and-estimate^clj

(count-and-estimate model messages output-text)

(count-and-estimate model messages output-text {:keys [pricing input-tokens]})

Counts tokens and estimates cost in one call.

Params: model - String. Model name. messages - Vector. Input messages for the prompt. output-text - String. The response text. opts - Map, optional:

:pricing - Map. Per-model pricing overrides.
:input-tokens - Integer, optional. Pre-counted input tokens. When provided, skips re-tokenizing messages (avoids duplicate work when check-context-limit already counted them).

Returns: Map with:

:input-tokens - Integer. Number of input tokens.
:output-tokens - Integer. Number of output tokens.
:total-tokens - Integer. Total tokens used.
:cost - Map with :input-cost, :output-cost, :total-cost in USD.

Example: (count-and-estimate "gpt-4o" [{:role "user" :content "Hello!"}] "Hello! How can I help you today?") ;; => {:input-tokens 8 ;; :output-tokens 9 ;; :total-tokens 17 ;; :cost {:input-cost 0.00002 :output-cost 0.00009 :total-cost 0.00011 ...}}

Counts tokens and estimates cost in one call.

Params:
`model` - String. Model name.
`messages` - Vector. Input messages for the prompt.
`output-text` - String. The response text.
`opts` - Map, optional:
  - :pricing - Map. Per-model pricing overrides.
  - :input-tokens - Integer, optional. Pre-counted input tokens.
      When provided, skips re-tokenizing messages (avoids duplicate work
      when check-context-limit already counted them).

Returns:
Map with:
- :input-tokens - Integer. Number of input tokens.
- :output-tokens - Integer. Number of output tokens.
- :total-tokens - Integer. Total tokens used.
- :cost - Map with :input-cost, :output-cost, :total-cost in USD.

Example:
(count-and-estimate "gpt-4o"
                    [{:role "user" :content "Hello!"}]
                    "Hello! How can I help you today?")
;; => {:input-tokens 8
;;     :output-tokens 9
;;     :total-tokens 17
;;     :cost {:input-cost 0.00002 :output-cost 0.00009 :total-cost 0.00011 ...}}

raw docstring

count-messages^clj

(count-messages model messages)

Counts tokens for a chat completion message array.

Accounts for:

Message content tokens (text blocks tokenized via JTokkit)
Role field overhead
Per-message formatting overhead
Reply priming (every reply is primed with <|start|>assistant<|message|>)
Multimodal content (images sized via OpenAI's vision tile formula when dimensions are available, conservative fallback otherwise)

Params: model - String. Model name. messages - Vector of maps with :role and :content keys. Content can be string (text) or vector (multimodal).

Returns: Integer. Total token count for the messages.

Example: (count-messages "gpt-4o" [{:role "system" :content "You are helpful."} {:role "user" :content "Hello!"}]) ;; => 15

Counts tokens for a chat completion message array.

Accounts for:
- Message content tokens (text blocks tokenized via JTokkit)
- Role field overhead
- Per-message formatting overhead
- Reply priming (every reply is primed with <|start|>assistant<|message|>)
- Multimodal content (images sized via OpenAI's vision tile formula
  when dimensions are available, conservative fallback otherwise)

Params:
`model` - String. Model name.
`messages` - Vector of maps with :role and :content keys.
            Content can be string (text) or vector (multimodal).

Returns:
Integer. Total token count for the messages.

Example:
(count-messages "gpt-4o" 
                [{:role "system" :content "You are helpful."}
                 {:role "user" :content "Hello!"}])
;; => 15

raw docstring

count-tokens^clj

(count-tokens model text)

Counts tokens for a given text string using the specified model's encoding.

Params: model - String. Model name (e.g., 'gpt-4o', 'gpt-4', 'gpt-3.5-turbo'). text - String. The text to count tokens for.

Returns: Integer. Number of tokens.

Example: (count-tokens "gpt-4o" "Hello, world!") ;; => 4

Counts tokens for a given text string using the specified model's encoding.

Params:
`model` - String. Model name (e.g., 'gpt-4o', 'gpt-4', 'gpt-3.5-turbo').
`text` - String. The text to count tokens for.

Returns:
Integer. Number of tokens.

Example:
(count-tokens "gpt-4o" "Hello, world!")
;; => 4

raw docstring

DEFAULT_CONTEXT_LIMITS^clj

Default maximum context window sizes for LLM models (in tokens).

These are the TOTAL context limits (input + output). Override per-model via :context-limits in make-config.

Sources:

OpenAI: https://platform.openai.com/docs/models
Anthropic: https://docs.anthropic.com/en/docs/about-claude/models
Google: https://cloud.google.com/vertex-ai/generative-ai/docs/models
Zhipu: https://docs.z.ai/guides/overview/pricing
DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
Mistral: https://docs.mistral.ai/models

Last updated: February 2026

Default maximum context window sizes for LLM models (in tokens).

These are the TOTAL context limits (input + output).
Override per-model via :context-limits in make-config.

Sources:
- OpenAI: https://platform.openai.com/docs/models
- Anthropic: https://docs.anthropic.com/en/docs/about-claude/models
- Google: https://cloud.google.com/vertex-ai/generative-ai/docs/models
- Zhipu: https://docs.z.ai/guides/overview/pricing
- DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
- Mistral: https://docs.mistral.ai/models

Last updated: February 2026

raw docstring

DEFAULT_MODEL_PRICING^clj

Default pricing per 1M tokens in USD as of February 2026. Format: {:input price-per-1M :output price-per-1M} Override per-model via :pricing in make-config.

Sources:

OpenAI: https://developers.openai.com/api/docs/pricing/
Anthropic: https://docs.claude.com/en/about-claude/pricing
Google: https://cloud.google.com/vertex-ai/generative-ai/pricing
Zhipu: https://docs.z.ai/guides/overview/pricing
DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
Mistral: https://docs.mistral.ai/models

Note: These are approximate and may change. Update periodically.

Default pricing per 1M tokens in USD as of February 2026.
Format: {:input price-per-1M :output price-per-1M}
Override per-model via :pricing in make-config.

Sources:
- OpenAI: https://developers.openai.com/api/docs/pricing/
- Anthropic: https://docs.claude.com/en/about-claude/pricing
- Google: https://cloud.google.com/vertex-ai/generative-ai/pricing
- Zhipu: https://docs.z.ai/guides/overview/pricing
- DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
- Mistral: https://docs.mistral.ai/models

Note: These are approximate and may change. Update periodically.

raw docstring

DEFAULT_OUTPUT_RESERVE^clj

Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally. Override per-call via :output-reserve in check-context-limit or ask! opts.

Default number of tokens to reserve for model output.
0 means no reservation — let the API handle overflow naturally.
Override per-call via :output-reserve in check-context-limit or ask! opts.

raw docstring

DEFAULT_TRIM_RATIO^clj

Default ratio of context to use (leaving room for output). 0.75 means use 75% for input, reserve 25% for output.

Default ratio of context to use (leaving room for output).
0.75 means use 75% for input, reserve 25% for output.

raw docstring

estimate-cost^clj

(estimate-cost model input-tokens output-tokens)

(estimate-cost model input-tokens output-tokens pricing-map)

Estimates the cost in USD for a given token count.

Params: model - String. Model name. input-tokens - Integer. Number of input (prompt) tokens. output-tokens - Integer. Number of output (completion) tokens.

Returns: Map with:

:input-cost - Float. Cost for input tokens in USD.
:output-cost - Float. Cost for output tokens in USD.
:total-cost - Float. Total cost in USD.
:model - String. The model used for pricing.
:pricing - Map. The pricing rates used.

Example: (estimate-cost "gpt-4o" 1000 500) ;; => {:input-cost 0.0025 ;; :output-cost 0.005 ;; :total-cost 0.0075 ;; :model "gpt-4o" ;; :pricing {:input 2.50 :output 10.00}}

Estimates the cost in USD for a given token count.

Params:
`model` - String. Model name.
`input-tokens` - Integer. Number of input (prompt) tokens.
`output-tokens` - Integer. Number of output (completion) tokens.

Returns:
Map with:
- :input-cost - Float. Cost for input tokens in USD.
- :output-cost - Float. Cost for output tokens in USD.
- :total-cost - Float. Total cost in USD.
- :model - String. The model used for pricing.
- :pricing - Map. The pricing rates used.

Example:
(estimate-cost "gpt-4o" 1000 500)
;; => {:input-cost 0.0025
;;     :output-cost 0.005
;;     :total-cost 0.0075
;;     :model "gpt-4o"
;;     :pricing {:input 2.50 :output 10.00}}

raw docstring

format-cost^clj

(format-cost cost)

Formats a cost value as a human-readable USD string.

Params: cost - Number. Cost in USD.

Returns: String. Formatted cost (e.g., "$0.0025" or "<$0.0001").

Example: (format-cost 0.0025) ;; => "$0.0025"

Formats a cost value as a human-readable USD string.

Params:
`cost` - Number. Cost in USD.

Returns:
String. Formatted cost (e.g., "$0.0025" or "<$0.0001").

Example:
(format-cost 0.0025)
;; => "$0.0025"

raw docstring

max-input-tokens^clj

(max-input-tokens model)

(max-input-tokens model {:keys [output-reserve trim-ratio context-limits]})

Calculates maximum input tokens for a model, reserving space for output.

Params: model - String. Model name. opts - Map, optional:

:output-reserve - Integer. Tokens to reserve for output. Defaults to model's max output tokens (from DEFAULT_MAX_OUTPUT_TOKENS).
:trim-ratio - Float. Alternative: use ratio of context (default: nil). When set, overrides :output-reserve.

Returns: Integer. Maximum input tokens.

Example: (max-input-tokens "gpt-4o") ;; => 128000 (128000 - 0, default reserve is 0)

(max-input-tokens "gpt-4o" {:output-reserve 4096}) ;; => 123904 (128000 - 4096)

(max-input-tokens "gpt-4o" {:trim-ratio 0.75}) ;; => 96000 (128000 * 0.75)

Calculates maximum input tokens for a model, reserving space for output.

Params:
`model` - String. Model name.
`opts` - Map, optional:
  - :output-reserve - Integer. Tokens to reserve for output.
      Defaults to model's max output tokens (from DEFAULT_MAX_OUTPUT_TOKENS).
  - :trim-ratio - Float. Alternative: use ratio of context (default: nil).
      When set, overrides :output-reserve.

Returns:
 Integer. Maximum input tokens.
 
 Example:
 (max-input-tokens "gpt-4o")
 ;; => 128000 (128000 - 0, default reserve is 0)
 
 (max-input-tokens "gpt-4o" {:output-reserve 4096})
 ;; => 123904 (128000 - 4096)
 
 (max-input-tokens "gpt-4o" {:trim-ratio 0.75})
 ;; => 96000 (128000 * 0.75)

raw docstring

truncate-messages^clj

(truncate-messages model messages max-tokens)

Truncates a message array to fit within a token limit.

Strategy (priority-based):

ALWAYS preserve system message (index 0) if present
ALWAYS preserve the most recent user message
Trim from the MIDDLE (oldest conversation turns)
This respects LLM primacy/recency bias

Params: model - String. Model name. messages - Vector. Chat messages [{:role :content}]. max-tokens - Integer. Maximum total tokens allowed.

Returns: Vector. Truncated messages that fit within limit.

Example: (truncate-messages "gpt-4o" messages 4000)

Truncates a message array to fit within a token limit.

Strategy (priority-based):
1. ALWAYS preserve system message (index 0) if present
2. ALWAYS preserve the most recent user message
3. Trim from the MIDDLE (oldest conversation turns)
4. This respects LLM primacy/recency bias

Params:
`model` - String. Model name.
`messages` - Vector. Chat messages [{:role :content}].
`max-tokens` - Integer. Maximum total tokens allowed.

Returns:
Vector. Truncated messages that fit within limit.

Example:
(truncate-messages "gpt-4o" messages 4000)

raw docstring

truncate-text^clj

(truncate-text model text max-tokens)

(truncate-text model
               text
               max-tokens
               {:keys [truncation-marker from] :or {from :end}})

Truncates text to fit within a token limit.

Uses proper tokenization to ensure accurate truncation. Does NOT cut in the middle of multi-token words.

Params: model - String. Model name for tokenization. text - String. Text to truncate. max-tokens - Integer. Maximum tokens allowed. opts - Map, optional:

:truncation-marker - String. Appended when truncated (default: nil).
:from - Keyword. Where to truncate: :end (default), :start, or :middle.

Returns: String. Truncated text, or original if within limit.

Example: (truncate-text "gpt-4o" "Hello world, this is a test" 5) ;; => "Hello world,"

(truncate-text "gpt-4o" long-text 1000 {:truncation-marker "..."}) ;; => "First part of text..."

Truncates text to fit within a token limit.

Uses proper tokenization to ensure accurate truncation.
Does NOT cut in the middle of multi-token words.

Params:
`model` - String. Model name for tokenization.
`text` - String. Text to truncate.
`max-tokens` - Integer. Maximum tokens allowed.
`opts` - Map, optional:
  - :truncation-marker - String. Appended when truncated (default: nil).
  - :from - Keyword. Where to truncate: :end (default), :start, or :middle.

Returns:
String. Truncated text, or original if within limit.

Example:
(truncate-text "gpt-4o" "Hello world, this is a test" 5)
;; => "Hello world,"

(truncate-text "gpt-4o" long-text 1000 {:truncation-marker "..."})
;; => "First part of text..."

raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close