Liking cljdoc? Tell your friends :D

mohitkhare

Clojure utilities for content analysis and LLM workflow support. This library provides token estimation, word counting, reading time calculation, and slug generation for content management and publishing pipelines.

Built for the content tooling behind mohitkhare.me, these functions handle the common text analysis tasks that come up when building article pipelines, CMS integrations, and AI-assisted writing workflows.

Installation

Add to your project.clj dependencies:

[mohitkhare "0.1.0"]

Or in deps.edn:

{:deps {mohitkhare/mohitkhare {:mvn/version "0.1.0"}}}

Usage

(require '[mohitkhare.core :as mk])

;; Count words in a text
(mk/word-count "Clojure is a functional language on the JVM")
;; => 8

;; Estimate LLM token count using the ~4 chars/token heuristic
(mk/estimate-tokens "This sentence has roughly 12 tokens by estimation.")
;; => 13

;; Use a custom characters-per-token ratio
(mk/estimate-tokens "Dense technical text" 3.5)
;; => 6

;; Calculate reading time with formatted output
(mk/reading-time "A long article body goes here with many paragraphs...")
;; => {:words 9, :minutes 1, :display "1 min read"}

;; Process multiple articles with threading
(->> ["Short post" "A somewhat longer article with more content to read"]
     (map mk/reading-time)
     (map :display))
;; => ("1 min read" "1 min read")

;; Generate URL slugs from article titles
(mk/slugify "How to Build a REST API in Clojure")
;; => "how-to-build-a-rest-api-in-clojure"

;; Combine functions for a content analysis pipeline
(let [text "Your article body goes here with several paragraphs of content..."]
  {:slug   (mk/slugify "My Article Title")
   :words  (mk/word-count text)
   :tokens (mk/estimate-tokens text)
   :read   (:display (mk/reading-time text))})
;; => {:slug "my-article-title", :words 10, :tokens 17, :read "1 min read"}

Token Estimation

The token estimator uses the widely accepted heuristic of approximately 4 characters per token for English text, matching the behavior of GPT-style tokenizers. For more precise counts on multilingual or code-heavy content, pass a custom ratio as the second argument to estimate-tokens.

Reading Time

Reading time defaults to 238 words per minute, the average adult reading speed according to published literacy research. The function always returns at least 1 minute and provides both raw minutes and a pre-formatted display string suitable for article metadata.

License

Distributed under the MIT License.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close