(split ds)
(split ds split-type)
(split ds split-type {:keys [seed parallel?] :as opts})
Split given dataset into train and test.
As the result two new columns are added:
:$split-name
- with :train
or :test
values:$split-id
- id of train/test pairsplit-type
can be one of the following:
:kfold
- k-fold strategy, :k
defines number of folds (defaults to 5
), produces k
splits:bootstrap
- :ratio
defines ratio of observations put into result (defaults to 1.0
), produces 1
split:holdout
- split into two parts with given ratio (defaults to 2/3
), produces 1
split:loo
- leave one out, produces the same number of splits as number of observationsAdditionally you can provide:
:seed
- for random number generator:repeats
- repeat procedure :repeats
times:partition-selector
- same as in group-by
for stratified splitting to reflect dataset structure in splits.:split-col-name
- a column where name of split is stored, either :train
or :test
values (default: :$split-name
):split-id-col-name
- a column where id of the train/test pair is stored (default: :$split-id
)Rows are shuffled before splitting.
In case of grouped dataset each group is processed separately.
See more
Split given dataset into train and test. As the result two new columns are added: * `:$split-name` - with `:train` or `:test` values * `:$split-id` - id of train/test pair `split-type` can be one of the following: * `:kfold` - k-fold strategy, `:k` defines number of folds (defaults to `5`), produces `k` splits * `:bootstrap` - `:ratio` defines ratio of observations put into result (defaults to `1.0`), produces `1` split * `:holdout` - split into two parts with given ratio (defaults to `2/3`), produces `1` split * `:loo` - leave one out, produces the same number of splits as number of observations Additionally you can provide: * `:seed` - for random number generator * `:repeats` - repeat procedure `:repeats` times * `:partition-selector` - same as in `group-by` for stratified splitting to reflect dataset structure in splits. * `:split-col-name` - a column where name of split is stored, either `:train` or `:test` values (default: `:$split-name`) * `:split-id-col-name` - a column where id of the train/test pair is stored (default: `:$split-id`) Rows are shuffled before splitting. In case of grouped dataset each group is processed separately. See [more](https://www.mitpressjournals.org/doi/pdf/10.1162/EVCO_a_00069)
(split->seq ds)
(split->seq ds split-type)
(split->seq ds
split-type
{:keys [split-col-name split-id-col-name]
:or {split-col-name :$split-name split-id-col-name :$split-id}
:as opts})
Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)
Returns split as a sequence of train/test datasets or map of sequences (grouped dataset)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close