Table extraction. The :lines strategy reconstructs a grid from ruling lines
(explicit line objects plus rectangle edges): near-collinear edges are snapped
together, grid intersections are found, and cells are the rectangles whose
four corners are all intersections. Words are assigned to cells by center.
Table extraction. The `:lines` strategy reconstructs a grid from ruling lines (explicit line objects plus rectangle edges): near-collinear edges are snapped together, grid intersections are found, and cells are the rectangles whose four corners are all intersections. Words are assigned to cells by center.
(extract-table doc)(extract-table doc
{:keys [strategy snap-tolerance]
:or {strategy :lines snap-tolerance default-tolerance}
:as opts})Extract a single table as {:page-number :strategy :bbox :rows :cells :debug}.
:rows is a vector of rows, each a vector of {:text :bbox} cells. Options:
:page, :strategy (:lines default, or :text), :snap-tolerance
(:lines, default 3.0), and for :text: :text-x-tolerance,
:text-y-tolerance, :min-words-vertical (3), :min-words-horizontal (1).
The :text strategy is heuristic and intended for digitally generated PDFs.
Extract a single table as `{:page-number :strategy :bbox :rows :cells :debug}`.
`:rows` is a vector of rows, each a vector of `{:text :bbox}` cells. Options:
`:page`, `:strategy` (`:lines` default, or `:text`), `:snap-tolerance`
(`:lines`, default 3.0), and for `:text`: `:text-x-tolerance`,
`:text-y-tolerance`, `:min-words-vertical` (3), `:min-words-horizontal` (1).
The `:text` strategy is heuristic and intended for digitally generated PDFs.(extract-tables doc)(extract-tables doc opts)Extract tables on the page as a vector. v1 returns at most one table (the
bounding grid). Same options as extract-table.
Extract tables on the page as a vector. v1 returns at most one table (the bounding grid). Same options as `extract-table`.
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |