Liking cljdoc? Tell your friends :D
Clojure only.

pdfplumber.text

Character, word, and text extraction over PDFBox's PDFTextStripper.

PDFTextStripper already reports direction-adjusted coordinates in a top-left origin, so char maps are built directly from getXDirAdj/getYDirAdj without a page-height flip. Words are formed by clustering chars into lines (within :y-tolerance) and splitting on horizontal gaps wider than :x-tolerance.

Character, word, and text extraction over PDFBox's PDFTextStripper.

PDFTextStripper already reports direction-adjusted coordinates in a top-left
origin, so char maps are built directly from `getXDirAdj`/`getYDirAdj` without
a page-height flip. Words are formed by clustering chars into lines (within
`:y-tolerance`) and splitting on horizontal gaps wider than `:x-tolerance`.
raw docstring

charsclj

(chars doc)
(chars doc {:keys [page bbox]})

Vector of character maps {:text :x0 :top :x1 :bottom :font-name :font-size :page-number}. Options: :page (1-based, limit to one page) and :bbox (keep chars whose center falls inside [x0 top x1 bottom]).

Vector of character maps `{:text :x0 :top :x1 :bottom :font-name :font-size
:page-number}`. Options: `:page` (1-based, limit to one page) and `:bbox`
(keep chars whose center falls inside `[x0 top x1 bottom]`).
sourceraw docstring

textclj

(text doc)
(text doc
      {:keys [x-tolerance y-tolerance]
       :or {x-tolerance default-tolerance y-tolerance default-tolerance}
       :as opts})

Reconstructed text: words joined by spaces within a line, lines by newlines. Accepts the same options as words.

Reconstructed text: words joined by spaces within a line, lines by newlines.
Accepts the same options as `words`.
sourceraw docstring

wordsclj

(words doc)
(words doc
       {:keys [x-tolerance y-tolerance]
        :or {x-tolerance default-tolerance y-tolerance default-tolerance}
        :as opts})

Vector of word maps {:text :x0 :top :x1 :bottom :page-number}, reading order. Options: :page, :bbox, :x-tolerance (default 3.0), :y-tolerance (default 3.0).

Vector of word maps `{:text :x0 :top :x1 :bottom :page-number}`, reading order.
Options: `:page`, `:bbox`, `:x-tolerance` (default 3.0), `:y-tolerance`
(default 3.0).
sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close