Document reader wrapping Apache Tika parser https://tika.apache.org/
Tika supports lots of different file formats https://tika.apache.org/2.9.1/formats.html
parse function will use Tika capabilities to convert provided
document data to a map containing text and metadata fields.
(parse (clojure.java.io/input-stream "data/memory.pdf"))
=>
{:text "Memory, reasoning, and categorization: parallels and
common mechanisms
Brett K. ..."
:metadata {:dc:creator "Brett K. Hayes"
:dc:description "Traditionally, memory, reasoning, and categorization have been "}
Document reader wrapping Apache Tika parser
https://tika.apache.org/
Tika supports lots of different file formats
https://tika.apache.org/2.9.1/formats.html
`parse` function will use Tika capabilities to convert provided
document data to a map containing `text` and `metadata` fields.
```clojure
(parse (clojure.java.io/input-stream "data/memory.pdf"))
=>
{:text "Memory, reasoning, and categorization: parallels and
common mechanisms
Brett K. ..."
:metadata {:dc:creator "Brett K. Hayes"
:dc:description "Traditionally, memory, reasoning, and categorization have been "}
```(parse stream-or-file-name)Extract text and metadata from doc-input-stream. The stream can contain
and data of a file formats supported by Tika. File format detection will be
done automatcaly by Tika.
Returns a map with
text field containing document in a plain text formatmetadata Dublin Core defined metadata fields if document has those definedExtract text and metadata from `doc-input-stream`. The stream can contain and data of a file formats supported by Tika. File format detection will be done automatcaly by Tika. Returns a map with - `text` field containing document in a plain text format - `metadata` Dublin Core defined metadata fields if document has those defined
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |