Token counting utilities for LLM API interactions.
Based on JTokkit (https://github.com/knuddelsgmbh/jtokkit) - a Java implementation of OpenAI's TikToken tokenizer.
Provides:
count-tokens - Count tokens for a string using a specific model's encodingcount-messages - Count tokens for a chat completion message arrayestimate-cost - Estimate cost in USD based on model pricingcount-and-estimate - Count tokens and estimate cost in one callcontext-limit - Get max context window for a modelmax-input-tokens - Get max input tokens (context minus output reserve)truncate-text - Token-aware text truncationtruncate-messages - Smart message truncation with prioritycheck-context-limit - Pre-flight check before API callsformat-cost - Format USD cost for displayget-model-pricing - Look up per-model pricing infoNote: Token counts are approximate. Chat completion API payloads have ~25 token error margin due to internal OpenAI formatting that isn't publicly documented.
Token counting utilities for LLM API interactions. Based on JTokkit (https://github.com/knuddelsgmbh/jtokkit) - a Java implementation of OpenAI's TikToken tokenizer. Provides: - `count-tokens` - Count tokens for a string using a specific model's encoding - `count-messages` - Count tokens for a chat completion message array - `estimate-cost` - Estimate cost in USD based on model pricing - `count-and-estimate` - Count tokens and estimate cost in one call - `context-limit` - Get max context window for a model - `max-input-tokens` - Get max input tokens (context minus output reserve) - `truncate-text` - Token-aware text truncation - `truncate-messages` - Smart message truncation with priority - `check-context-limit` - Pre-flight check before API calls - `format-cost` - Format USD cost for display - `get-model-pricing` - Look up per-model pricing info Note: Token counts are approximate. Chat completion API payloads have ~25 token error margin due to internal OpenAI formatting that isn't publicly documented.
(check-context-limit model messages)(check-context-limit model
messages
{:keys [output-reserve throw? context-limits]
:or {output-reserve DEFAULT_OUTPUT_RESERVE throw? false}})Checks if messages fit within model context limit.
Use this BEFORE making API calls to get clear error messages instead of cryptic API errors.
Params:
model - String. Model name.
messages - Vector. Chat messages.
opts - Map, optional:
Returns: Map with:
Example: (check-context-limit "gpt-4o" messages) ;; => {:ok? true :input-tokens 5000 :max-input-tokens 128000 ...}
(check-context-limit "gpt-4o" messages {:output-reserve 4096}) ;; => {:ok? true :input-tokens 5000 :max-input-tokens 123904 ...}
(check-context-limit "gpt-4" huge-messages {:throw? true}) ;; => throws ExceptionInfo with detailed error
Checks if messages fit within model context limit.
Use this BEFORE making API calls to get clear error messages
instead of cryptic API errors.
Params:
`model` - String. Model name.
`messages` - Vector. Chat messages.
`opts` - Map, optional:
- :output-reserve - Integer. Tokens reserved for output (default: 0).
When 0, the full context window is available for input — the API
handles output allocation naturally.
- :throw? - Boolean. Throw exception if over limit (default: false).
- :context-limits - Map. Per-model context window overrides.
Returns:
Map with:
- :ok? - Boolean. True if messages fit.
- :input-tokens - Integer. Counted input tokens.
- :max-input-tokens - Integer. Maximum allowed.
- :context-limit - Integer. Model's total context.
- :output-reserve - Integer. Effective output reserve used.
- :overflow - Integer. How many tokens over limit (0 if ok).
- :error - String or nil. Error message if not ok.
Example:
(check-context-limit "gpt-4o" messages)
;; => {:ok? true :input-tokens 5000 :max-input-tokens 128000 ...}
(check-context-limit "gpt-4o" messages {:output-reserve 4096})
;; => {:ok? true :input-tokens 5000 :max-input-tokens 123904 ...}
(check-context-limit "gpt-4" huge-messages {:throw? true})
;; => throws ExceptionInfo with detailed error(context-limit model)(context-limit model context-limits)Returns the maximum context window size for a model.
Params:
model - String. Model name.
context-limits - Map, optional. Override map (merged defaults from config).
Returns: Integer. Maximum context tokens.
Example: (context-limit "gpt-4o") ;; => 128000
Returns the maximum context window size for a model. Params: `model` - String. Model name. `context-limits` - Map, optional. Override map (merged defaults from config). Returns: Integer. Maximum context tokens. Example: (context-limit "gpt-4o") ;; => 128000
(count-and-estimate model messages output-text)(count-and-estimate model messages output-text {:keys [pricing input-tokens]})Counts tokens and estimates cost in one call.
Params:
model - String. Model name.
messages - Vector. Input messages for the prompt.
output-text - String. The response text.
opts - Map, optional:
Returns: Map with:
Example: (count-and-estimate "gpt-4o" [{:role "user" :content "Hello!"}] "Hello! How can I help you today?") ;; => {:input-tokens 8 ;; :output-tokens 9 ;; :total-tokens 17 ;; :cost {:input-cost 0.00002 :output-cost 0.00009 :total-cost 0.00011 ...}}
Counts tokens and estimates cost in one call.
Params:
`model` - String. Model name.
`messages` - Vector. Input messages for the prompt.
`output-text` - String. The response text.
`opts` - Map, optional:
- :pricing - Map. Per-model pricing overrides.
- :input-tokens - Integer, optional. Pre-counted input tokens.
When provided, skips re-tokenizing messages (avoids duplicate work
when check-context-limit already counted them).
Returns:
Map with:
- :input-tokens - Integer. Number of input tokens.
- :output-tokens - Integer. Number of output tokens.
- :total-tokens - Integer. Total tokens used.
- :cost - Map with :input-cost, :output-cost, :total-cost in USD.
Example:
(count-and-estimate "gpt-4o"
[{:role "user" :content "Hello!"}]
"Hello! How can I help you today?")
;; => {:input-tokens 8
;; :output-tokens 9
;; :total-tokens 17
;; :cost {:input-cost 0.00002 :output-cost 0.00009 :total-cost 0.00011 ...}}(count-messages model messages)Counts tokens for a chat completion message array.
Accounts for:
Params:
model - String. Model name.
messages - Vector of maps with :role and :content keys.
Content can be string (text) or vector (multimodal).
Returns: Integer. Total token count for the messages.
Example: (count-messages "gpt-4o" [{:role "system" :content "You are helpful."} {:role "user" :content "Hello!"}]) ;; => 15
Counts tokens for a chat completion message array.
Accounts for:
- Message content tokens (text blocks tokenized via JTokkit)
- Role field overhead
- Per-message formatting overhead
- Reply priming (every reply is primed with <|start|>assistant<|message|>)
- Multimodal content (images sized via OpenAI's vision tile formula
when dimensions are available, conservative fallback otherwise)
Params:
`model` - String. Model name.
`messages` - Vector of maps with :role and :content keys.
Content can be string (text) or vector (multimodal).
Returns:
Integer. Total token count for the messages.
Example:
(count-messages "gpt-4o"
[{:role "system" :content "You are helpful."}
{:role "user" :content "Hello!"}])
;; => 15(count-tokens model text)Counts tokens for a given text string using the specified model's encoding.
Params:
model - String. Model name (e.g., 'gpt-4o', 'gpt-4', 'gpt-3.5-turbo').
text - String. The text to count tokens for.
Returns: Integer. Number of tokens.
Example: (count-tokens "gpt-4o" "Hello, world!") ;; => 4
Counts tokens for a given text string using the specified model's encoding. Params: `model` - String. Model name (e.g., 'gpt-4o', 'gpt-4', 'gpt-3.5-turbo'). `text` - String. The text to count tokens for. Returns: Integer. Number of tokens. Example: (count-tokens "gpt-4o" "Hello, world!") ;; => 4
Default maximum context window sizes for LLM models (in tokens).
These are the TOTAL context limits (input + output). Override per-model via :context-limits in make-config.
Sources:
Last updated: February 2026
Default maximum context window sizes for LLM models (in tokens). These are the TOTAL context limits (input + output). Override per-model via :context-limits in make-config. Sources: - OpenAI: https://platform.openai.com/docs/models - Anthropic: https://docs.anthropic.com/en/docs/about-claude/models - Google: https://cloud.google.com/vertex-ai/generative-ai/docs/models - Zhipu: https://docs.z.ai/guides/overview/pricing - DeepSeek: https://api-docs.deepseek.com/quick_start/pricing - Mistral: https://docs.mistral.ai/models Last updated: February 2026
Default pricing per 1M tokens in USD as of February 2026. Format: {:input price-per-1M :output price-per-1M} Override per-model via :pricing in make-config.
Sources:
Note: These are approximate and may change. Update periodically.
Default pricing per 1M tokens in USD as of February 2026.
Format: {:input price-per-1M :output price-per-1M}
Override per-model via :pricing in make-config.
Sources:
- OpenAI: https://developers.openai.com/api/docs/pricing/
- Anthropic: https://docs.claude.com/en/about-claude/pricing
- Google: https://cloud.google.com/vertex-ai/generative-ai/pricing
- Zhipu: https://docs.z.ai/guides/overview/pricing
- DeepSeek: https://api-docs.deepseek.com/quick_start/pricing
- Mistral: https://docs.mistral.ai/models
Note: These are approximate and may change. Update periodically.Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally. Override per-call via :output-reserve in check-context-limit or ask! opts.
Default number of tokens to reserve for model output. 0 means no reservation — let the API handle overflow naturally. Override per-call via :output-reserve in check-context-limit or ask! opts.
Default ratio of context to use (leaving room for output). 0.75 means use 75% for input, reserve 25% for output.
Default ratio of context to use (leaving room for output). 0.75 means use 75% for input, reserve 25% for output.
(estimate-cost model input-tokens output-tokens)(estimate-cost model input-tokens output-tokens pricing-map)Estimates the cost in USD for a given token count.
Params:
model - String. Model name.
input-tokens - Integer. Number of input (prompt) tokens.
output-tokens - Integer. Number of output (completion) tokens.
Returns: Map with:
Example: (estimate-cost "gpt-4o" 1000 500) ;; => {:input-cost 0.0025 ;; :output-cost 0.005 ;; :total-cost 0.0075 ;; :model "gpt-4o" ;; :pricing {:input 2.50 :output 10.00}}
Estimates the cost in USD for a given token count.
Params:
`model` - String. Model name.
`input-tokens` - Integer. Number of input (prompt) tokens.
`output-tokens` - Integer. Number of output (completion) tokens.
Returns:
Map with:
- :input-cost - Float. Cost for input tokens in USD.
- :output-cost - Float. Cost for output tokens in USD.
- :total-cost - Float. Total cost in USD.
- :model - String. The model used for pricing.
- :pricing - Map. The pricing rates used.
Example:
(estimate-cost "gpt-4o" 1000 500)
;; => {:input-cost 0.0025
;; :output-cost 0.005
;; :total-cost 0.0075
;; :model "gpt-4o"
;; :pricing {:input 2.50 :output 10.00}}(format-cost cost)Formats a cost value as a human-readable USD string.
Params:
cost - Number. Cost in USD.
Returns: String. Formatted cost (e.g., "$0.0025" or "<$0.0001").
Example: (format-cost 0.0025) ;; => "$0.0025"
Formats a cost value as a human-readable USD string. Params: `cost` - Number. Cost in USD. Returns: String. Formatted cost (e.g., "$0.0025" or "<$0.0001"). Example: (format-cost 0.0025) ;; => "$0.0025"
(max-input-tokens model)(max-input-tokens model {:keys [output-reserve trim-ratio context-limits]})Calculates maximum input tokens for a model, reserving space for output.
Params:
model - String. Model name.
opts - Map, optional:
Returns: Integer. Maximum input tokens.
Example: (max-input-tokens "gpt-4o") ;; => 128000 (128000 - 0, default reserve is 0)
(max-input-tokens "gpt-4o" {:output-reserve 4096}) ;; => 123904 (128000 - 4096)
(max-input-tokens "gpt-4o" {:trim-ratio 0.75}) ;; => 96000 (128000 * 0.75)
Calculates maximum input tokens for a model, reserving space for output.
Params:
`model` - String. Model name.
`opts` - Map, optional:
- :output-reserve - Integer. Tokens to reserve for output.
Defaults to model's max output tokens (from DEFAULT_MAX_OUTPUT_TOKENS).
- :trim-ratio - Float. Alternative: use ratio of context (default: nil).
When set, overrides :output-reserve.
Returns:
Integer. Maximum input tokens.
Example:
(max-input-tokens "gpt-4o")
;; => 128000 (128000 - 0, default reserve is 0)
(max-input-tokens "gpt-4o" {:output-reserve 4096})
;; => 123904 (128000 - 4096)
(max-input-tokens "gpt-4o" {:trim-ratio 0.75})
;; => 96000 (128000 * 0.75)(truncate-messages model messages max-tokens)Truncates a message array to fit within a token limit.
Strategy (priority-based):
Params:
model - String. Model name.
messages - Vector. Chat messages [{:role :content}].
max-tokens - Integer. Maximum total tokens allowed.
Returns: Vector. Truncated messages that fit within limit.
Example: (truncate-messages "gpt-4o" messages 4000)
Truncates a message array to fit within a token limit.
Strategy (priority-based):
1. ALWAYS preserve system message (index 0) if present
2. ALWAYS preserve the most recent user message
3. Trim from the MIDDLE (oldest conversation turns)
4. This respects LLM primacy/recency bias
Params:
`model` - String. Model name.
`messages` - Vector. Chat messages [{:role :content}].
`max-tokens` - Integer. Maximum total tokens allowed.
Returns:
Vector. Truncated messages that fit within limit.
Example:
(truncate-messages "gpt-4o" messages 4000)(truncate-text model text max-tokens)(truncate-text model
text
max-tokens
{:keys [truncation-marker from] :or {from :end}})Truncates text to fit within a token limit.
Uses proper tokenization to ensure accurate truncation. Does NOT cut in the middle of multi-token words.
Params:
model - String. Model name for tokenization.
text - String. Text to truncate.
max-tokens - Integer. Maximum tokens allowed.
opts - Map, optional:
Returns: String. Truncated text, or original if within limit.
Example: (truncate-text "gpt-4o" "Hello world, this is a test" 5) ;; => "Hello world,"
(truncate-text "gpt-4o" long-text 1000 {:truncation-marker "..."}) ;; => "First part of text..."
Truncates text to fit within a token limit.
Uses proper tokenization to ensure accurate truncation.
Does NOT cut in the middle of multi-token words.
Params:
`model` - String. Model name for tokenization.
`text` - String. Text to truncate.
`max-tokens` - Integer. Maximum tokens allowed.
`opts` - Map, optional:
- :truncation-marker - String. Appended when truncated (default: nil).
- :from - Keyword. Where to truncate: :end (default), :start, or :middle.
Returns:
String. Truncated text, or original if within limit.
Example:
(truncate-text "gpt-4o" "Hello world, this is a test" 5)
;; => "Hello world,"
(truncate-text "gpt-4o" long-text 1000 {:truncation-marker "..."})
;; => "First part of text..."cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |