Liking cljdoc? Tell your friends :D

scicloj.metamorph.ml.design-matrix


create-design-matrixclj

(create-design-matrix ds targets-specs features-specs)

Converts the given dataset into a full numeric dataset.

  • ds is the tech.v3.dataset to transform
  • target-specs are the specifications how to transform the target variables
  • features-specs are the specifications how to transform the features

The 'spec' can express several types of dataset transformations in a compact way:

  • add new derived columns
  • remove columns
  • rename columns
  • convert columns to categorical
  • set inference target

Columns specs are in general given as pairs of [colname function]

function need to be given as list (quoted by '), and can refer to column names.

They get evaluated from top->bottom, and can refer to each other.

Not listed columns get removed.

Special syntax:

  • :a-column keeps column as-is (calls identity fn)
  • [nil '(+ a b)] or ['(+ a b)] autogenerated column name

The following aliases can be used as part of the spec. (Other functions need to be full qualified).

clojure.core can be used without full qualifying the symbols

  • ds (tech.v3.dataset)
  • tc (tablecloth.api)
  • tcc (tablecloth.column.api)

Example:

(dm/create-design-matrix ds [:y] [
[:sum '(+ :a :b :c)] ])

This will:

  • set inference target to y:
  • create a new derived variables :sum, being the sum of a,b,c
  • remove all columns except :y and :sum

This covers a range of cases, but is not as complete as R formulae. Specialy it does not handle automatic expansion of categorical variables, but these can be manually specified.

See design_matrix_test.clj for more examples.

(for model type :fastmath/ols , linear regression, we support a different way of expressing arbitrary 'row transformations' using :transformer option see fastmath.ml/lm documentation)

Converts the given dataset into a full numeric dataset.

* `ds` is the tech.v3.dataset to transform
* `target-specs` are the specifications how to transform the target variables
* `features-specs` are the specifications how to transform the features 

The 'spec' can express several types of dataset transformations in a compact way:
- add new derived columns
- remove columns
- rename columns
- convert columns to categorical
- set inference target


Columns specs are in general given as pairs of [colname function]

function need to be given as list (quoted by '), and can refer to column names.

They get evaluated from top->bottom, and can refer to each other.

Not listed columns get removed.

Special syntax:

- :a-column                      keeps column as-is (calls `identity` fn)
- [nil '(+ a b)] or ['(+ a b)]   autogenerated column name 

The following aliases can be used as part of the spec.
(Other functions need to be full qualified).

clojure.core  can be used without full qualifying the symbols

- ds             (tech.v3.dataset)
- tc             (tablecloth.api)
- tcc            (tablecloth.column.api)


Example:

(dm/create-design-matrix
      ds
      [:y] 
      [         
       [:sum '(+ :a :b :c)]
      ])

This will:
- set inference target to y:
- create a new derived variables :sum, being the sum of a,b,c
- remove all columns except :y and :sum

This covers a range of cases, but is not as complete as `R formulae`.
Specialy it does not handle automatic expansion of categorical variables,
but these can be manually specified.


See  `design_matrix_test.clj` for more examples.

(for model type :fastmath/ols , linear regression, we support a different way
of expressing arbitrary 'row transformations' using :transformer option 
see `fastmath.ml/lm` documentation)

sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close