Liking cljdoc? Tell your friends :D

com.blockether.svar.internal.rlm.internal.pageindex.pdf

PDF to images conversion and metadata extraction using Apache PDFBox.

Provides:

  • pdf->images - Convert PDF file to vector of BufferedImage objects
  • page-count - Get total page count of a PDF file
  • pdf-metadata - Extract PDF metadata (author, title, dates, etc.)
  • detect-text-rotation - Detect content rotation per page using text position heuristics

Uses PDFBox for reliable PDF rendering at configurable DPI. Handles error cases: encrypted PDFs, corrupted files, file not found.

PDF to images conversion and metadata extraction using Apache PDFBox.

Provides:
- `pdf->images` - Convert PDF file to vector of BufferedImage objects
- `page-count` - Get total page count of a PDF file
- `pdf-metadata` - Extract PDF metadata (author, title, dates, etc.)
- `detect-text-rotation` - Detect content rotation per page using text position heuristics

Uses PDFBox for reliable PDF rendering at configurable DPI.
Handles error cases: encrypted PDFs, corrupted files, file not found.
raw docstring

detect-text-rotationclj

(detect-text-rotation pdf-path)
(detect-text-rotation pdf-path {:keys [page-set]})

Detects content rotation for each page of a PDF using text position heuristics.

Analyzes the direction of text characters on each page via PDFBox's TextPosition. If the majority of characters flow in a non-standard direction, the page content is rotated (e.g., landscape table on a portrait page).

TextPosition directions:

  • 0°: Normal left-to-right text → no correction needed
  • 90°: Text flows bottom-to-top → correct by rotating image 270° CW
  • 180°: Upside-down text → correct by rotating image 180°
  • 270°: Text flows top-to-bottom → correct by rotating image 90° CW

Params: pdf-path - String. Path to the PDF file. opts - Optional map with: :page-set - Set of 0-indexed page numbers to analyze, or nil for all pages.

Returns: Vector of integers, one per selected page. Each value is the clockwise rotation in degrees (0, 90, 180, or 270) needed to correct the rendered image.

Example: [0 0 90] means pages 0-1 are normal, page 2 needs 90° CW rotation.

Detects content rotation for each page of a PDF using text position heuristics.

Analyzes the direction of text characters on each page via PDFBox's TextPosition.
If the majority of characters flow in a non-standard direction, the page content
is rotated (e.g., landscape table on a portrait page).

TextPosition directions:
- 0°: Normal left-to-right text → no correction needed
- 90°: Text flows bottom-to-top → correct by rotating image 270° CW
- 180°: Upside-down text → correct by rotating image 180°
- 270°: Text flows top-to-bottom → correct by rotating image 90° CW

Params:
`pdf-path` - String. Path to the PDF file.
`opts` - Optional map with:
  `:page-set` - Set of 0-indexed page numbers to analyze, or nil for all pages.

Returns:
Vector of integers, one per selected page. Each value is the clockwise rotation
in degrees (0, 90, 180, or 270) needed to correct the rendered image.

Example:
[0 0 90] means pages 0-1 are normal, page 2 needs 90° CW rotation.
sourceraw docstring

page-countclj

(page-count pdf-path)

Returns the number of pages in a PDF file.

Params: pdf-path - String. Path to the PDF file.

Returns: Integer. Number of pages.

Throws: Same exceptions as pdf->images.

Returns the number of pages in a PDF file.

Params:
`pdf-path` - String. Path to the PDF file.

Returns:
Integer. Number of pages.

Throws:
Same exceptions as `pdf->images`.
sourceraw docstring

pdf->imagesclj

(pdf->images pdf-path)
(pdf->images pdf-path {:keys [dpi page-set] :or {dpi DEFAULT_DPI}})

Converts a PDF file to a vector of BufferedImage objects.

Params: pdf-path - String. Path to the PDF file. opts - Optional map with: :dpi - Integer. Rendering DPI (default 150). :page-set - Set of 0-indexed page numbers to render, or nil for all pages.

Returns: Vector of BufferedImage objects, one per selected page.

Throws: ex-info for not-found, corrupted, or encrypted PDFs.

Converts a PDF file to a vector of BufferedImage objects.

Params:
`pdf-path` - String. Path to the PDF file.
`opts` - Optional map with:
  `:dpi` - Integer. Rendering DPI (default 150).
  `:page-set` - Set of 0-indexed page numbers to render, or nil for all pages.

Returns:
Vector of BufferedImage objects, one per selected page.

Throws:
ex-info for not-found, corrupted, or encrypted PDFs.
sourceraw docstring

pdf-metadataclj

(pdf-metadata pdf-path)

Extracts metadata from a PDF file.

Params: pdf-path - String. Path to the PDF file.

Returns: Map with: :author - String or nil. Document author. :title - String or nil. Document title. :subject - String or nil. Document subject. :creator - String or nil. Creating application. :producer - String or nil. PDF producer. :created-at - Instant or nil. Creation date. :updated-at - Instant or nil. Modification date. :keywords - String or nil. Document keywords.

Throws: Same exceptions as pdf->images.

Extracts metadata from a PDF file.

Params:
`pdf-path` - String. Path to the PDF file.

Returns:
Map with:
  `:author` - String or nil. Document author.
  `:title` - String or nil. Document title.
  `:subject` - String or nil. Document subject.
  `:creator` - String or nil. Creating application.
  `:producer` - String or nil. PDF producer.
  `:created-at` - Instant or nil. Creation date.
  `:updated-at` - Instant or nil. Modification date.
  `:keywords` - String or nil. Document keywords.

Throws:
Same exceptions as `pdf->images`.
sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close