(create-design-matrix ds targets-specs features-specs)Converts the given dataset into a full numeric dataset.
ds is the tech.v3.dataset to transformtarget-specs are the specifications how to transform the target variablesfeatures-specs are the specifications how to transform the featuresThe 'spec' can express several types of dataset transformations in a compact way:
Columns specs are in general given as pairs of [colname function]
function need to be given as list (quoted by '), and can refer to column names.
They get evaluated from top->bottom, and can refer to each other.
Not listed columns get removed.
Special syntax:
identity fn)The following aliases can be used as part of the spec. (Other functions need to be full qualified).
clojure.core can be used without full qualifying the symbols
Example:
(dm/create-design-matrix
ds
[:y]
[
[:sum '(+ :a :b :c)]
])
This will:
This covers a range of cases, but is not as complete as R formulae.
Specialy it does not handle automatic expansion of categorical variables,
but these can be manually specified.
See design_matrix_test.clj for more examples.
(for model type :fastmath/ols , linear regression, we support a different way
of expressing arbitrary 'row transformations' using :transformer option
see fastmath.ml/lm documentation)
Converts the given dataset into a full numeric dataset.
* `ds` is the tech.v3.dataset to transform
* `target-specs` are the specifications how to transform the target variables
* `features-specs` are the specifications how to transform the features
The 'spec' can express several types of dataset transformations in a compact way:
- add new derived columns
- remove columns
- rename columns
- convert columns to categorical
- set inference target
Columns specs are in general given as pairs of [colname function]
function need to be given as list (quoted by '), and can refer to column names.
They get evaluated from top->bottom, and can refer to each other.
Not listed columns get removed.
Special syntax:
- :a-column keeps column as-is (calls `identity` fn)
- [nil '(+ a b)] or ['(+ a b)] autogenerated column name
The following aliases can be used as part of the spec.
(Other functions need to be full qualified).
clojure.core can be used without full qualifying the symbols
- ds (tech.v3.dataset)
- tc (tablecloth.api)
- tcc (tablecloth.column.api)
Example:
(dm/create-design-matrix
ds
[:y]
[
[:sum '(+ :a :b :c)]
])
This will:
- set inference target to y:
- create a new derived variables :sum, being the sum of a,b,c
- remove all columns except :y and :sum
This covers a range of cases, but is not as complete as `R formulae`.
Specialy it does not handle automatic expansion of categorical variables,
but these can be manually specified.
See `design_matrix_test.clj` for more examples.
(for model type :fastmath/ols , linear regression, we support a different way
of expressing arbitrary 'row transformations' using :transformer option
see `fastmath.ml/lm` documentation)
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |