com.blockether.svar.internal.rlm.internal.pageindex.pdf

Liking cljdoc? Tell your friends :D

Clojure only.

detect-text-rotation
page-count
pdf->images
pdf-metadata

com.blockether.svar.internal.rlm.internal.pageindex.pdf

PDF to images conversion and metadata extraction using Apache PDFBox.

Provides:

pdf->images - Convert PDF file to vector of BufferedImage objects
page-count - Get total page count of a PDF file
pdf-metadata - Extract PDF metadata (author, title, dates, etc.)
detect-text-rotation - Detect content rotation per page using text position heuristics

Uses PDFBox for reliable PDF rendering at configurable DPI. Handles error cases: encrypted PDFs, corrupted files, file not found.

PDF to images conversion and metadata extraction using Apache PDFBox.

Provides:
- `pdf->images` - Convert PDF file to vector of BufferedImage objects
- `page-count` - Get total page count of a PDF file
- `pdf-metadata` - Extract PDF metadata (author, title, dates, etc.)
- `detect-text-rotation` - Detect content rotation per page using text position heuristics

Uses PDFBox for reliable PDF rendering at configurable DPI.
Handles error cases: encrypted PDFs, corrupted files, file not found.

raw docstring

detect-text-rotation^clj

(detect-text-rotation pdf-path)

(detect-text-rotation pdf-path {:keys [page-set]})

Detects content rotation for each page of a PDF using text position heuristics.

Analyzes the direction of text characters on each page via PDFBox's TextPosition. If the majority of characters flow in a non-standard direction, the page content is rotated (e.g., landscape table on a portrait page).

TextPosition directions:

0°: Normal left-to-right text → no correction needed
90°: Text flows bottom-to-top → correct by rotating image 270° CW
180°: Upside-down text → correct by rotating image 180°
270°: Text flows top-to-bottom → correct by rotating image 90° CW

Params: pdf-path - String. Path to the PDF file. opts - Optional map with: :page-set - Set of 0-indexed page numbers to analyze, or nil for all pages.

Returns: Vector of integers, one per selected page. Each value is the clockwise rotation in degrees (0, 90, 180, or 270) needed to correct the rendered image.

Example: [0 0 90] means pages 0-1 are normal, page 2 needs 90° CW rotation.

Detects content rotation for each page of a PDF using text position heuristics.

Analyzes the direction of text characters on each page via PDFBox's TextPosition.
If the majority of characters flow in a non-standard direction, the page content
is rotated (e.g., landscape table on a portrait page).

TextPosition directions:
- 0°: Normal left-to-right text → no correction needed
- 90°: Text flows bottom-to-top → correct by rotating image 270° CW
- 180°: Upside-down text → correct by rotating image 180°
- 270°: Text flows top-to-bottom → correct by rotating image 90° CW

Params:
`pdf-path` - String. Path to the PDF file.
`opts` - Optional map with:
  `:page-set` - Set of 0-indexed page numbers to analyze, or nil for all pages.

Returns:
Vector of integers, one per selected page. Each value is the clockwise rotation
in degrees (0, 90, 180, or 270) needed to correct the rendered image.

Example:
[0 0 90] means pages 0-1 are normal, page 2 needs 90° CW rotation.

source raw docstring

page-count^clj

(page-count pdf-path)

Returns the number of pages in a PDF file.

Params: pdf-path - String. Path to the PDF file.

Returns: Integer. Number of pages.

Throws: Same exceptions as pdf->images.

Returns the number of pages in a PDF file.

Params:
`pdf-path` - String. Path to the PDF file.

Returns:
Integer. Number of pages.

Throws:
Same exceptions as `pdf->images`.

source raw docstring

pdf->images^clj

(pdf->images pdf-path)

(pdf->images pdf-path {:keys [dpi page-set] :or {dpi DEFAULT_DPI}})

Converts a PDF file to a vector of BufferedImage objects.

Params: pdf-path - String. Path to the PDF file. opts - Optional map with: :dpi - Integer. Rendering DPI (default 150). :page-set - Set of 0-indexed page numbers to render, or nil for all pages.

Returns: Vector of BufferedImage objects, one per selected page.

Throws: ex-info for not-found, corrupted, or encrypted PDFs.

Converts a PDF file to a vector of BufferedImage objects.

Params:
`pdf-path` - String. Path to the PDF file.
`opts` - Optional map with:
  `:dpi` - Integer. Rendering DPI (default 150).
  `:page-set` - Set of 0-indexed page numbers to render, or nil for all pages.

Returns:
Vector of BufferedImage objects, one per selected page.

Throws:
ex-info for not-found, corrupted, or encrypted PDFs.

source raw docstring

pdf-metadata^clj

(pdf-metadata pdf-path)

Extracts metadata from a PDF file.

Params: pdf-path - String. Path to the PDF file.

Returns: Map with: :author - String or nil. Document author. :title - String or nil. Document title. :subject - String or nil. Document subject. :creator - String or nil. Creating application. :producer - String or nil. PDF producer. :created-at - Instant or nil. Creation date. :updated-at - Instant or nil. Modification date. :keywords - String or nil. Document keywords.

Throws: Same exceptions as pdf->images.

Extracts metadata from a PDF file.

Params:
`pdf-path` - String. Path to the PDF file.

Returns:
Map with:
  `:author` - String or nil. Document author.
  `:title` - String or nil. Document title.
  `:subject` - String or nil. Document subject.
  `:creator` - String or nil. Creating application.
  `:producer` - String or nil. PDF producer.
  `:created-at` - Instant or nil. Creation date.
  `:updated-at` - Instant or nil. Modification date.
  `:keywords` - String or nil. Document keywords.

Throws:
Same exceptions as `pdf->images`.

source raw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close