Dataset fetching and storing from HuggingFace.
HF Datasets provide rich functionality https://huggingface.co/docs/datasets/index
Replicating it all would be a sizeable effort, here I have only functionality needed by Bosquet. TODO extract it to a separate OSS lib (even if it only covers basics of HF DS functionality)
Dataset fetching and storing from HuggingFace. HF Datasets provide rich functionality https://huggingface.co/docs/datasets/index Replicating it all would be a sizeable effort, here I have only functionality needed by Bosquet. TODO extract it to a separate OSS lib (even if it only covers basics of HF DS functionality)
JTokkit wrapper to get encode/decode and get token counts. Plus a price estimator for model produced tokens
JTokkit wrapper to get encode/decode and get token counts. Plus a price estimator for model produced tokens
No vars found in this namespace.
Document reader wrapping Apache Tika parser https://tika.apache.org/
Tika supports lots of different file formats https://tika.apache.org/2.9.1/formats.html
parse
function will use Tika capabilities to convert provided
document data to a map containing text
and metadata
fields.
(parse (clojure.java.io/input-stream "data/memory.pdf"))
=>
{:text "Memory, reasoning, and categorization: parallels and
common mechanisms
Brett K. ..."
:metadata {:dc:creator "Brett K. Hayes"
:dc:description "Traditionally, memory, reasoning, and categorization have been "}
Document reader wrapping Apache Tika parser https://tika.apache.org/ Tika supports lots of different file formats https://tika.apache.org/2.9.1/formats.html `parse` function will use Tika capabilities to convert provided document data to a map containing `text` and `metadata` fields. ```clojure (parse (clojure.java.io/input-stream "data/memory.pdf")) => {:text "Memory, reasoning, and categorization: parallels and common mechanisms Brett K. ..." :metadata {:dc:creator "Brett K. Hayes" :dc:description "Traditionally, memory, reasoning, and categorization have been "} ```
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close