PDF to images conversion and metadata extraction using Apache PDFBox.
Provides:
pdf->images - Convert PDF file to vector of BufferedImage objectspage-count - Get total page count of a PDF filepdf-metadata - Extract PDF metadata (author, title, dates, etc.)detect-text-rotation - Detect content rotation per page using text position heuristicsUses PDFBox for reliable PDF rendering at configurable DPI. Handles error cases: encrypted PDFs, corrupted files, file not found.
PDF to images conversion and metadata extraction using Apache PDFBox. Provides: - `pdf->images` - Convert PDF file to vector of BufferedImage objects - `page-count` - Get total page count of a PDF file - `pdf-metadata` - Extract PDF metadata (author, title, dates, etc.) - `detect-text-rotation` - Detect content rotation per page using text position heuristics Uses PDFBox for reliable PDF rendering at configurable DPI. Handles error cases: encrypted PDFs, corrupted files, file not found.
(detect-text-rotation pdf-path)(detect-text-rotation pdf-path {:keys [page-set]})Detects content rotation for each page of a PDF using text position heuristics.
Analyzes the direction of text characters on each page via PDFBox's TextPosition. If the majority of characters flow in a non-standard direction, the page content is rotated (e.g., landscape table on a portrait page).
TextPosition directions:
Params:
pdf-path - String. Path to the PDF file.
opts - Optional map with:
:page-set - Set of 0-indexed page numbers to analyze, or nil for all pages.
Returns: Vector of integers, one per selected page. Each value is the clockwise rotation in degrees (0, 90, 180, or 270) needed to correct the rendered image.
Example: [0 0 90] means pages 0-1 are normal, page 2 needs 90° CW rotation.
Detects content rotation for each page of a PDF using text position heuristics. Analyzes the direction of text characters on each page via PDFBox's TextPosition. If the majority of characters flow in a non-standard direction, the page content is rotated (e.g., landscape table on a portrait page). TextPosition directions: - 0°: Normal left-to-right text → no correction needed - 90°: Text flows bottom-to-top → correct by rotating image 270° CW - 180°: Upside-down text → correct by rotating image 180° - 270°: Text flows top-to-bottom → correct by rotating image 90° CW Params: `pdf-path` - String. Path to the PDF file. `opts` - Optional map with: `:page-set` - Set of 0-indexed page numbers to analyze, or nil for all pages. Returns: Vector of integers, one per selected page. Each value is the clockwise rotation in degrees (0, 90, 180, or 270) needed to correct the rendered image. Example: [0 0 90] means pages 0-1 are normal, page 2 needs 90° CW rotation.
(page-count pdf-path)Returns the number of pages in a PDF file.
Params:
pdf-path - String. Path to the PDF file.
Returns: Integer. Number of pages.
Throws:
Same exceptions as pdf->images.
Returns the number of pages in a PDF file. Params: `pdf-path` - String. Path to the PDF file. Returns: Integer. Number of pages. Throws: Same exceptions as `pdf->images`.
(pdf->images pdf-path)(pdf->images pdf-path {:keys [dpi page-set] :or {dpi DEFAULT_DPI}})Converts a PDF file to a vector of BufferedImage objects.
Params:
pdf-path - String. Path to the PDF file.
opts - Optional map with:
:dpi - Integer. Rendering DPI (default 150).
:page-set - Set of 0-indexed page numbers to render, or nil for all pages.
Returns: Vector of BufferedImage objects, one per selected page.
Throws: ex-info for not-found, corrupted, or encrypted PDFs.
Converts a PDF file to a vector of BufferedImage objects. Params: `pdf-path` - String. Path to the PDF file. `opts` - Optional map with: `:dpi` - Integer. Rendering DPI (default 150). `:page-set` - Set of 0-indexed page numbers to render, or nil for all pages. Returns: Vector of BufferedImage objects, one per selected page. Throws: ex-info for not-found, corrupted, or encrypted PDFs.
(pdf-metadata pdf-path)Extracts metadata from a PDF file.
Params:
pdf-path - String. Path to the PDF file.
Returns:
Map with:
:author - String or nil. Document author.
:title - String or nil. Document title.
:subject - String or nil. Document subject.
:creator - String or nil. Creating application.
:producer - String or nil. PDF producer.
:created-at - Instant or nil. Creation date.
:updated-at - Instant or nil. Modification date.
:keywords - String or nil. Document keywords.
Throws:
Same exceptions as pdf->images.
Extracts metadata from a PDF file. Params: `pdf-path` - String. Path to the PDF file. Returns: Map with: `:author` - String or nil. Document author. `:title` - String or nil. Document title. `:subject` - String or nil. Document subject. `:creator` - String or nil. Creating application. `:producer` - String or nil. PDF producer. `:created-at` - Instant or nil. Creation date. `:updated-at` - Instant or nil. Modification date. `:keywords` - String or nil. Document keywords. Throws: Same exceptions as `pdf->images`.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |