Liking cljdoc? Tell your friends :D

com.blockether.svar.internal.tokens

Token counting utilities for LLM API interactions.

Based on JTokkit (https://github.com/knuddelsgmbh/jtokkit) - a Java implementation of OpenAI's TikToken tokenizer.

Provides:

  • count-tokens - Count tokens for a string using a specific model's encoding
  • count-messages - Count tokens for a chat completion message array
  • estimate-cost - Estimate cost in USD based on model pricing
  • count-and-estimate - Count tokens and estimate cost in one call
  • context-limit - Get max context window for a model
  • max-input-tokens - Get max input tokens (context minus output reserve)
  • truncate-text - Token-aware text truncation
  • truncate-messages - Smart message truncation with priority
  • check-context-limit - Pre-flight check before API calls
  • format-cost - Format USD cost for display
  • get-model-pricing - Look up per-model pricing info

Note: Token counts are approximate. Chat completion API payloads have ~25 token error margin due to internal OpenAI formatting that isn't publicly documented.

Token counting utilities for LLM API interactions.

Based on JTokkit (https://github.com/knuddelsgmbh/jtokkit) - a Java implementation
of OpenAI's TikToken tokenizer.

Provides:
- `count-tokens` - Count tokens for a string using a specific model's encoding
- `count-messages` - Count tokens for a chat completion message array
- `estimate-cost` - Estimate cost in USD based on model pricing
- `count-and-estimate` - Count tokens and estimate cost in one call
- `context-limit` - Get max context window for a model
- `max-input-tokens` - Get max input tokens (context minus output reserve)
- `truncate-text` - Token-aware text truncation
 - `truncate-messages` - Smart message truncation with priority
 - `check-context-limit` - Pre-flight check before API calls
 - `format-cost` - Format USD cost for display
- `get-model-pricing` - Look up per-model pricing info

Note: Token counts are approximate. Chat completion API payloads have ~25 token
error margin due to internal OpenAI formatting that isn't publicly documented.
raw docstring

check-context-limitclj

(check-context-limit model messages)
(check-context-limit model
                     messages
                     {:keys [output-reserve throw? context-limits]
                      :or {output-reserve DEFAULT_OUTPUT_RESERVE throw? false}})

Checks if messages fit within model context limit.

Use this BEFORE making API calls to get clear error messages instead of cryptic API errors.

Params: model - String. Model name. messages - Vector. Chat messages. opts - Map, optional:

  • :output-reserve - Integer. Tokens reserved for output (default: 0). When 0, the full context window is available for input — the API handles output allocation naturally.
  • :throw? - Boolean. Throw exception if over limit (default: false).
  • :context-limits - Map. Per-model context window overrides.

Returns: Map with:

  • :ok? - Boolean. True if messages fit.
  • :input-tokens - Integer. Counted input tokens.
  • :max-input-tokens - Integer. Maximum allowed.
  • :context-limit - Integer. Model's total context.
  • :output-reserve - Integer. Effective output reserve used.
  • :overflow - Integer. How many tokens over limit (0 if ok).
  • :error - String or nil. Error message if not ok.

Example: (check-context-limit "gpt-4o" messages) ;; => {:ok? true :input-tokens 5000 :max-input-tokens 128000 ...}

(check-context-limit "gpt-4o" messages {:output-reserve 4096}) ;; => {:ok? true :input-tokens 5000 :max-input-tokens 123904 ...}

(check-context-limit "gpt-4" huge-messages {:throw? true}) ;; => throws ExceptionInfo with detailed error

Checks if messages fit within model context limit.

Use this BEFORE making API calls to get clear error messages
instead of cryptic API errors.

Params:
`model` - String. Model name.
`messages` - Vector. Chat messages.
 `opts` - Map, optional:
   - :output-reserve - Integer. Tokens reserved for output (default: 0).
       When 0, the full context window is available for input — the API
       handles output allocation naturally.
   - :throw? - Boolean. Throw exception if over limit (default: false).
   - :context-limits - Map. Per-model context window overrides.
 
 Returns:
 Map with:
   - :ok? - Boolean. True if messages fit.
   - :input-tokens - Integer. Counted input tokens.
   - :max-input-tokens - Integer. Maximum allowed.
   - :context-limit - Integer. Model's total context.
   - :output-reserve - Integer. Effective output reserve used.
   - :overflow - Integer. How many tokens over limit (0 if ok).
   - :error - String or nil. Error message if not ok.
 
 Example:
 (check-context-limit "gpt-4o" messages)
 ;; => {:ok? true :input-tokens 5000 :max-input-tokens 128000 ...}
 
 (check-context-limit "gpt-4o" messages {:output-reserve 4096})
 ;; => {:ok? true :input-tokens 5000 :max-input-tokens 123904 ...}
 
 (check-context-limit "gpt-4" huge-messages {:throw? true})
 ;; => throws ExceptionInfo with detailed error
raw docstring

context-limitclj

(context-limit model)
(context-limit model context-limits)

Returns the maximum context window size for a model.

Params: model - String. Model name. context-limits - Map, optional. Override map (merged defaults from config).

Returns: Integer. Maximum context tokens.

Example: (context-limit "gpt-4o") ;; => 128000

Returns the maximum context window size for a model.

Params:
`model` - String. Model name.
`context-limits` - Map, optional. Override map (merged defaults from config).

Returns:
Integer. Maximum context tokens.

Example:
(context-limit "gpt-4o")
;; => 128000
raw docstring

count-and-estimateclj

(count-and-estimate model messages output-text)
(count-and-estimate model messages output-text {:keys [pricing input-tokens]})

Counts tokens and estimates cost in one call.

Params: model - String. Model name. messages - Vector. Input messages for the prompt. output-text - String. The response text. opts - Map, optional:

  • :pricing - Map. Per-model pricing overrides.
  • :input-tokens - Integer, optional. Pre-counted input tokens. When provided, skips re-tokenizing messages (avoids duplicate work when check-context-limit already counted them).

Returns: Map with:

  • :input-tokens - Integer. Number of input tokens.
  • :output-tokens - Integer. Number of output tokens.
  • :total-tokens - Integer. Total tokens used.
  • :cost - Map with :input-cost, :output-cost, :total-cost in USD.

Example: (count-and-estimate "gpt-4o" [{:role "user" :content "Hello!"}] "Hello! How can I help you today?") ;; => {:input-tokens 8 ;; :output-tokens 9 ;; :total-tokens 17 ;; :cost {:input-cost 0.00002 :output-cost 0.00009 :total-cost 0.00011 ...}}

Counts tokens and estimates cost in one call.

Params:
`model` - String. Model name.
`messages` - Vector. Input messages for the prompt.
`output-text` - String. The response text.
`opts` - Map, optional:
  - :pricing - Map. Per-model pricing overrides.
  - :input-tokens - Integer, optional. Pre-counted input tokens.
      When provided, skips re-tokenizing messages (avoids duplicate work
      when check-context-limit already counted them).

Returns:
Map with:
- :input-tokens - Integer. Number of input tokens.
- :output-tokens - Integer. Number of output tokens.
- :total-tokens - Integer. Total tokens used.
- :cost - Map with :input-cost, :output-cost, :total-cost in USD.

Example:
(count-and-estimate "gpt-4o"
                    [{:role "user" :content "Hello!"}]
                    "Hello! How can I help you today?")
;; => {:input-tokens 8
;;     :output-tokens 9
;;     :total-tokens 17
;;     :cost {:input-cost 0.00002 :output-cost 0.00009 :total-cost 0.00011 ...}}
raw docstring

count-messagesclj

(count-messages model messages)

Counts tokens for a chat completion message array.

Accounts for:

  • Message content tokens (text blocks tokenized via JTokkit)
  • Role field overhead
  • Per-message formatting overhead
  • Reply priming (every reply is primed with <|start|>assistant<|message|>)
  • Multimodal content (images sized via OpenAI's vision tile formula when dimensions are available, conservative fallback otherwise)

Params: model - String. Model name. messages - Vector of maps with :role and :content keys. Content can be string (text) or vector (multimodal).

Returns: Integer. Total token count for the messages.

Example: (count-messages "gpt-4o" [{:role "system" :content "You are helpful."} {:role "user" :content "Hello!"}]) ;; => 15

Counts tokens for a chat completion message array.

Accounts for:
- Message content tokens (text blocks tokenized via JTokkit)
- Role field overhead
- Per-message formatting overhead
- Reply priming (every reply is primed with <|start|>assistant<|message|>)
- Multimodal content (images sized via OpenAI's vision tile formula
  when dimensions are available, conservative fallback otherwise)

Params:
`model` - String. Model name.
`messages` - Vector of maps with :role and :content keys.
            Content can be string (text) or vector (multimodal).

Returns:
Integer. Total token count for the messages.

Example:
(count-messages "gpt-4o" 
                [{:role "system" :content "You are helpful."}
                 {:role "user" :content "Hello!"}])
;; => 15
raw docstring

count-tokensclj

(count-tokens model text)

Counts tokens for a given text string using the specified model's encoding.

Params: model - String. Model name (e.g., 'gpt-4o', 'gpt-4', 'gpt-3.5-turbo'). text - String. The text to count tokens for.

Returns: Integer. Number of tokens.

Example: (count-tokens "gpt-4o" "Hello, world!") ;; => 4

Counts tokens for a given text string using the specified model's encoding.

Params:
`model` - String. Model name (e.g., 'gpt-4o', 'gpt-4', 'gpt-3.5-turbo').
`text` - String. The text to count tokens for.

Returns:
Integer. Number of tokens.

Example:
(count-tokens "gpt-4o" "Hello, world!")
;; => 4
raw docstring

DEFAULT_CONTEXT_LIMITSclj

Default maximum context window sizes for LLM models (in tokens).

These are the TOTAL context limits (input + output). Override per-model via :context-limits in make-config.

Sources:

Last updated: February 2026

Default maximum context window sizes for LLM models (in tokens).

These are the TOTAL context limits (input + output).
Override per-model via :context-limits in make-config.

Sources:
- OpenAI: https://platform.openai.com/docs/models
- Anthropic: https://docs.anthropic.com/en/docs/about-claude/models
- Google: https://cloud.google.com/vertex-ai/generative-ai/docs/models
- Zhipu: https://docs.z.ai/guides/overview/pricing
- DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
- Mistral: https://docs.mistral.ai/models

Last updated: February 2026
raw docstring

DEFAULT_MODEL_PRICINGclj

Default pricing per 1M tokens in USD as of February 2026. Format: {:input price-per-1M :output price-per-1M} Override per-model via :pricing in make-config.

Sources:

Note: These are approximate and may change. Update periodically.

Default pricing per 1M tokens in USD as of February 2026.
Format: {:input price-per-1M :output price-per-1M}
Override per-model via :pricing in make-config.

Sources:
- OpenAI: https://developers.openai.com/api/docs/pricing/
- Anthropic: https://docs.claude.com/en/about-claude/pricing
- Google: https://cloud.google.com/vertex-ai/generative-ai/pricing
- Zhipu: https://docs.z.ai/guides/overview/pricing
- DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
- Mistral: https://docs.mistral.ai/models

Note: These are approximate and may change. Update periodically.
raw docstring

DEFAULT_OUTPUT_RESERVEclj

Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally. Override per-call via :output-reserve in check-context-limit or ask! opts.

Default number of tokens to reserve for model output.
0 means no reservation — let the API handle overflow naturally.
Override per-call via :output-reserve in check-context-limit or ask! opts.
raw docstring

DEFAULT_TRIM_RATIOclj

Default ratio of context to use (leaving room for output). 0.75 means use 75% for input, reserve 25% for output.

Default ratio of context to use (leaving room for output).
0.75 means use 75% for input, reserve 25% for output.
raw docstring

estimate-costclj

(estimate-cost model input-tokens output-tokens)
(estimate-cost model input-tokens output-tokens pricing-map)

Estimates the cost in USD for a given token count.

Params: model - String. Model name. input-tokens - Integer. Number of input (prompt) tokens. output-tokens - Integer. Number of output (completion) tokens.

Returns: Map with:

  • :input-cost - Float. Cost for input tokens in USD.
  • :output-cost - Float. Cost for output tokens in USD.
  • :total-cost - Float. Total cost in USD.
  • :model - String. The model used for pricing.
  • :pricing - Map. The pricing rates used.

Example: (estimate-cost "gpt-4o" 1000 500) ;; => {:input-cost 0.0025 ;; :output-cost 0.005 ;; :total-cost 0.0075 ;; :model "gpt-4o" ;; :pricing {:input 2.50 :output 10.00}}

Estimates the cost in USD for a given token count.

Params:
`model` - String. Model name.
`input-tokens` - Integer. Number of input (prompt) tokens.
`output-tokens` - Integer. Number of output (completion) tokens.

Returns:
Map with:
- :input-cost - Float. Cost for input tokens in USD.
- :output-cost - Float. Cost for output tokens in USD.
- :total-cost - Float. Total cost in USD.
- :model - String. The model used for pricing.
- :pricing - Map. The pricing rates used.

Example:
(estimate-cost "gpt-4o" 1000 500)
;; => {:input-cost 0.0025
;;     :output-cost 0.005
;;     :total-cost 0.0075
;;     :model "gpt-4o"
;;     :pricing {:input 2.50 :output 10.00}}
raw docstring

format-costclj

(format-cost cost)

Formats a cost value as a human-readable USD string.

Params: cost - Number. Cost in USD.

Returns: String. Formatted cost (e.g., "$0.0025" or "<$0.0001").

Example: (format-cost 0.0025) ;; => "$0.0025"

Formats a cost value as a human-readable USD string.

Params:
`cost` - Number. Cost in USD.

Returns:
String. Formatted cost (e.g., "$0.0025" or "<$0.0001").

Example:
(format-cost 0.0025)
;; => "$0.0025"
raw docstring

max-input-tokensclj

(max-input-tokens model)
(max-input-tokens model {:keys [output-reserve trim-ratio context-limits]})

Calculates maximum input tokens for a model, reserving space for output.

Params: model - String. Model name. opts - Map, optional:

  • :output-reserve - Integer. Tokens to reserve for output. Defaults to model's max output tokens (from DEFAULT_MAX_OUTPUT_TOKENS).
  • :trim-ratio - Float. Alternative: use ratio of context (default: nil). When set, overrides :output-reserve.

Returns: Integer. Maximum input tokens.

Example: (max-input-tokens "gpt-4o") ;; => 128000 (128000 - 0, default reserve is 0)

(max-input-tokens "gpt-4o" {:output-reserve 4096}) ;; => 123904 (128000 - 4096)

(max-input-tokens "gpt-4o" {:trim-ratio 0.75}) ;; => 96000 (128000 * 0.75)

Calculates maximum input tokens for a model, reserving space for output.

Params:
`model` - String. Model name.
`opts` - Map, optional:
  - :output-reserve - Integer. Tokens to reserve for output.
      Defaults to model's max output tokens (from DEFAULT_MAX_OUTPUT_TOKENS).
  - :trim-ratio - Float. Alternative: use ratio of context (default: nil).
      When set, overrides :output-reserve.

Returns:
 Integer. Maximum input tokens.
 
 Example:
 (max-input-tokens "gpt-4o")
 ;; => 128000 (128000 - 0, default reserve is 0)
 
 (max-input-tokens "gpt-4o" {:output-reserve 4096})
 ;; => 123904 (128000 - 4096)
 
 (max-input-tokens "gpt-4o" {:trim-ratio 0.75})
 ;; => 96000 (128000 * 0.75)
raw docstring

truncate-messagesclj

(truncate-messages model messages max-tokens)

Truncates a message array to fit within a token limit.

Strategy (priority-based):

  1. ALWAYS preserve system message (index 0) if present
  2. ALWAYS preserve the most recent user message
  3. Trim from the MIDDLE (oldest conversation turns)
  4. This respects LLM primacy/recency bias

Params: model - String. Model name. messages - Vector. Chat messages [{:role :content}]. max-tokens - Integer. Maximum total tokens allowed.

Returns: Vector. Truncated messages that fit within limit.

Example: (truncate-messages "gpt-4o" messages 4000)

Truncates a message array to fit within a token limit.

Strategy (priority-based):
1. ALWAYS preserve system message (index 0) if present
2. ALWAYS preserve the most recent user message
3. Trim from the MIDDLE (oldest conversation turns)
4. This respects LLM primacy/recency bias

Params:
`model` - String. Model name.
`messages` - Vector. Chat messages [{:role :content}].
`max-tokens` - Integer. Maximum total tokens allowed.

Returns:
Vector. Truncated messages that fit within limit.

Example:
(truncate-messages "gpt-4o" messages 4000)
raw docstring

truncate-textclj

(truncate-text model text max-tokens)
(truncate-text model
               text
               max-tokens
               {:keys [truncation-marker from] :or {from :end}})

Truncates text to fit within a token limit.

Uses proper tokenization to ensure accurate truncation. Does NOT cut in the middle of multi-token words.

Params: model - String. Model name for tokenization. text - String. Text to truncate. max-tokens - Integer. Maximum tokens allowed. opts - Map, optional:

  • :truncation-marker - String. Appended when truncated (default: nil).
  • :from - Keyword. Where to truncate: :end (default), :start, or :middle.

Returns: String. Truncated text, or original if within limit.

Example: (truncate-text "gpt-4o" "Hello world, this is a test" 5) ;; => "Hello world,"

(truncate-text "gpt-4o" long-text 1000 {:truncation-marker "..."}) ;; => "First part of text..."

Truncates text to fit within a token limit.

Uses proper tokenization to ensure accurate truncation.
Does NOT cut in the middle of multi-token words.

Params:
`model` - String. Model name for tokenization.
`text` - String. Text to truncate.
`max-tokens` - Integer. Maximum tokens allowed.
`opts` - Map, optional:
  - :truncation-marker - String. Appended when truncated (default: nil).
  - :from - Keyword. Where to truncate: :end (default), :start, or :middle.

Returns:
String. Truncated text, or original if within limit.

Example:
(truncate-text "gpt-4o" "Hello world, this is a test" 5)
;; => "Hello world,"

(truncate-text "gpt-4o" long-text 1000 {:truncation-marker "..."})
;; => "First part of text..."
raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close