[scicloj/tablecloth "5.00-beta-27"]
[scicloj/tablecloth "4.04"]
tech.ml.dataset is a
great and fast library which brings columnar dataset to the Clojure.
Chris Nuernberger has been working on this library for last year as a
part of bigger tech.ml
stack.
I’ve started to test the library and help to fix uncovered bugs. My main goal was to compare functionalities with the other standards from other platforms. I focused on R solutions: dplyr, tidyr and data.table.
During conversions of the examples I’ve come up how to reorganized
existing tech.ml.dataset
functions into simple to use API. The main
goals were:
tech.ml
like pipelines, datatypes, readers, ML, etc.group-by
results with special kind of dataset - a dataset
containing subsets created after grouping as a column.Important! This library is not the replacement of tech.ml.dataset
nor
a separate library. It should be considered as a addition on the top of
tech.ml.dataset
.
If you want to know more about tech.ml.dataset
and dtype-next
please
refer their documentation:
Join the discussion on Zulip
Please refer detailed documentation with examples
(require '[tablecloth.api :as api])
(-> "https://raw.githubusercontent.com/techascent/tech.ml.dataset/master/test/data/stocks.csv"
(api/dataset {:key-fn keyword})
(api/group-by (fn [row]
{:symbol (:symbol row)
:year (tech.v3.datatype.datetime/long-temporal-field :years (:date row))}))
(api/aggregate #(tech.v3.datatype.functional/mean (% :price)))
(api/order-by [:symbol :year])
(api/head 10))
_unnamed [10 3]:
:symbol | :year | :summary |
---|---|---|
AAPL | 2000 | 21.74833333 |
AAPL | 2001 | 10.17583333 |
AAPL | 2002 | 9.40833333 |
AAPL | 2003 | 9.34750000 |
AAPL | 2004 | 18.72333333 |
AAPL | 2005 | 48.17166667 |
AAPL | 2006 | 72.04333333 |
AAPL | 2007 | 133.35333333 |
AAPL | 2008 | 138.48083333 |
AAPL | 2009 | 150.39333333 |
Tablecloth
is open for contribution. The best way to start is
discussion on
Zulip.
Documentation is written in RMarkdown, that means that you need R to create html/md/pdf files. Documentation contains around 600 code snippets which are run during build. There are two files:
README.Rmd
docs/index.Rmd
Prepare following software:
pandoc
install.packages(c("rmarkdown","knitr"), dependencies=T)
library(rmarkdown)
render("README.Rmd","md_document")
render("docs/index.Rmd","all")
lein do clean, check, test
and build documentation as described above
(which also tests whole library).parallel?
argument then (if applied).potemkin
pattern and import functions to the API namespace
using tech.v3.datatype.export-symbols/export-symbols
functiontablecloth.utils
namespace.README.Rmd
, CHANGELOG.md
, docs/index.Rmd
, tests
and function docs are highly welcomedCopyright (c) 2020 Scicloj
The MIT Licence
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close