Liking cljdoc? Tell your friends :D

pdfplumber-clj

Clojars Project cljdoc test

Plain-data PDF extraction for Clojure. Pull text, words, characters, geometric objects, and tables out of digitally generated PDFs as EDN/JSON-friendly maps and vectors — the Clojure counterpart to Python's pdfplumber, built on Apache PDFBox.

Stack

Clojure Apache PDFBox Kaocha GitHub Actions

Status

Early release (0.1.1). The extraction API (text, words, chars, objects, tables, crop) is in place and validated against the Python pdfplumber corpus; it may still evolve before 1.0.

Install

deps.edn

net.clojars.savya/pdfplumber-clj {:mvn/version "0.1.1"}

Leiningen

[net.clojars.savya/pdfplumber-clj "0.1.1"]

Requires JDK 17+.

Quickstart

(require '[pdfplumber.core :as pdf])

(pdf/with-pdf [doc "statement.pdf"]
  (pdf/text doc {:page 1}))           ; => "Account statement\n..."

(pdf/with-pdf [doc "statement.pdf"]
  (pdf/words doc {:page 1}))          ; => [{:text "Account" :x0 .. :top .. :x1 .. :bottom ..} ...]

(pdf/with-pdf [doc "invoice.pdf"]
  (pdf/extract-table doc {:page 1 :strategy :lines}))

Coordinate system

Public coordinates use a top-left origin (matching pdfplumber), with bounding boxes as [x0 top x1 bottom] in PDF user-space points. PDFBox's native bottom-left coordinates are converted internally.

Scope

In: text/word/char extraction, page geometry, crop/filter, ruling-line and text-aligned table extraction, deterministic plain-data output.

Out (v1): PDF generation, OCR, scanned/image PDFs, AcroForm extraction, layout ML. Table :text strategy is heuristic and intended for digitally generated PDFs.

License

Copyright © 2026 Savyasachi.

Distributed under the Eclipse Public License 2.0.

Can you improve this documentation?Edit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close