reorder-columns
can work on grouped dataset nowDeps updated
Documentation changed to be generated by Clay instead of RMarkdown
Deps updated to fix j/left-join
issue.
nil
as missing value only, discussion:nil-missing?
in more places needed (group-by operations), discussiongroup-by
documentation PR115, thanks to MarshallCollections/shuffle
removeddataset
(copied from TMD), #112rows
accepts :nil-missing?
(default: true) and copying?
(default: false) options.Deps updated
:hashing
is available for single column joins too:hashing
option determines method of creating an index for multicolumn joins (was hash
is identity
)Deps updated
map-rows
to map each row and produce new columnsrows
can return sequence of vectors (:as-vecs
)Updated to TMD v7
Differences:
Clojure upgraded to 1.11.1
separate-column
infers column names when function is used and target-columns
is nil
, #78separate-column
repleces source column with target on every caseclojure.core/pmap
with dtype-next
version (related to #325)get-entry
introduced
anti-join
and semi-join
bugs when tables contain missing valuescrosstab
- cross tabulationpivot->longer
:coerce-to-number
option addedpivot->wider
no longer coerces column names to strings, it's up to userTMD version bump
[breaking]
replace-missing
up/down strategies clarified. :down
is replaced by :downup
and :up
is replaced by :updown
. :down
and :up
work only in one direction now.
https://github.com/techascent/tech.ml.dataset/issues/305
data frame
term in the title of docs (discussion)cross-join
, expand
and complete
introduced*warn-on-reflection*
Version bump
unroll
and fold-by
by @holyjak (#60 and #61)select-rows
accepts IFn
for row selection.pipeline
namespace is stripped, all functions are moved to metamorph library. This is temporary solution before removing this namespace completely. Pipelined versions of functions will be moved to metamorph as well later.add-column
api
, is: tc
)replace-missing
on grouped dataset has swapped argumentsupdate-columns
on grouped dataset:as-rows
nowadd-column
default strategy is :strict
now.TMD upgrade, no changes in TC
TMD upgrade
reorder-columns
on empty dataset returns nilaggregate-columns
didn't keep column order (#35)pipeline
functions have doc
copied from original onessplit
can turn off shuffling now (:shuffle?
option)split :holdouts
- sequence of consecutive holdoutstech.ml.dataset version bump, this introduces the change of the order of the groups after group-by
operation
split :holdout
supports any number of splits (minimum 2) [#28]split
supports split-names
to provide custom names for subdatasetsconcat
and concat-copying
are working with grouped datasetskfold
split failed on small number of rows (due to partition-all
behavioursplit->seq
to return train/test splits as a sequence or datasets or as map of sequences for grouped datasetstablecloth.pipeline
returns a map with dataset under :metamorph/data
key (see metamorph)split
returns now a dataset or grouped dataset with two new columns indicating train/test and split id. See split->seq
for previous behaviour.without-grouping->
threading macro which allows operations on grouping dataset treated as a regular one.group-by
accepts any java.util.Map for a collection of indexes (use LinkedHashMap to persist an order)tablecloth.api.group-by
functions moved to tablecloth.api.utils
, no changes to APIadd-or-replace-column(s)
replaced by add-column(s)
(add-or-replace-column(s)
is marked as deprecated) (#16)mark-as-group
wasn't visible in API (#18)map-columns
didn't propagate new-type
for grouped case (#20)let-dataset
- to simulate tibble
from Rrows
and columns
new result: :as-double-arrays
- convert rows to 2d double arraytablecloth.pipeline
for pipeline operationsconcat-copying
exposed.split
function for splitting into train-test pairs with :kfold
, :bootstrap
, :loo
and holdout
strategies + stratified versionsreplace-missing
with new strategy :midpoint
t.m.d update
t.m.d update
t.m.d update
write-nippy!
and read-nippy
are deprecated, replaced by write!
and dataset
tech.ml.dataset
version 5.0-alpha*
map-columns
accepts optional target datatypeds/column->dataset
functionality introduced in separate-column
:text
among others)write-csv!
replaced by write!
(write-csv!
is marked as deprecated)info
field :size
is replaced by :n-elems
separate-column
3-arity version accepts separator
instead target-columns
nowtech.ml.dataset
version 4.04
tech.ml.dataset
version 4.03
parallel?
option set to true
). These are: aggregate
, unique-by
, order-by
, join-columns
, separate-columns
, ungroup
aggregation
uses now in-place ungrouping which is much fastertech.ml.dataset
version 3.06
fill-range-replace
to inject data to make continuous seqence in columnwrite-nippy!
and read-nippy
tech.ml.dataset
version 2.13
replace-missing
new strategies: :mid
and :lerp
, working also for dates.replace-missing
has different conctract and default strategy :mid
. value
argument is the last argument now.replace-missing
:up
and :down
strategies, when value
is nil
fills border missing values with nearest value.tech.ml.dataset
version 2.06
asof-join
addedreshape
testspivot->wider
accepts :drop-missing?
option (default: true
)pivot->wider
drops missing rows by defaultpivto->wider
order of concatenated column names is reversed (first: colnames, last: value), was opposite.pivot->longer
:splitter
accepts string used for splitting column nameCan you improve this documentation? These fine people already did:
GenerateMe, genmeblog, Kira McLean, apanj00 & ashimapanjwaniEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close