Liking cljdoc? Tell your friends :D

clj-ml-dev.filters

This namespace defines a set of functions that can be applied to data sets to modify the dataset in some way: transforming nominal attributes into binary attributes, removing attributes etc.

There are a number of ways to use the filtering API. The most straight forward and idomatic clojure way is to use the provided filter fns:

;; ds is the dataset (def ds (make-dataset :test [:a :b {:c [:g :m]}] [ [1 2 :g] [2 3 :m] [4 5 :g]])) (def filtered-ds (-> ds (add-attribute {:type :nominal, :column 1, :name "pet", :labels ["dog" "cat"]}) (remove-attributes {:attributes [:a :c]})))

The above functions rely on lower level fns that create and apply the filters which you may also use if you need more control over the actual filter objects:

(def filter (make-filter :remove-attributes {:dataset-format ds :attributes [:a :c]}))

;; We apply the filter to the original data set and obtain the new one (def filtered-ds (filter-apply filter ds))

The previous sample of code could be rewritten with the make-apply-filter function:

(def filtered-ds (make-apply-filter :remove-attributes {:attributes [:a :c]} ds))

This namespace defines a set of functions that can be applied to data sets to modify the
dataset in some way: transforming nominal attributes into binary attributes, removing
attributes etc.

There are a number of ways to use the filtering API.  The most straight forward and
idomatic clojure way is to use the provided filter fns:

  ;; ds is the dataset
  (def ds (make-dataset :test [:a :b {:c [:g :m]}]
                                  [ [1 2 :g]
                                    [2 3 :m]
                                    [4 5 :g]]))
  (def filtered-ds
     (-> ds
         (add-attribute {:type :nominal, :column 1, :name "pet", :labels ["dog" "cat"]})
         (remove-attributes {:attributes [:a :c]})))


The above functions rely on lower level fns that create and apply the filters which you may
also use if you need more control over the actual filter objects:

  (def filter (make-filter :remove-attributes {:dataset-format ds :attributes [:a :c]}))


  ;; We apply the filter to the original data set and obtain the new one
  (def filtered-ds (filter-apply filter ds))


The previous sample of code could be rewritten with the make-apply-filter function:

  (def filtered-ds (make-apply-filter :remove-attributes {:attributes [:a :c]} ds))
raw docstring

add-attributeclj

(add-attribute ds__1119__auto__)
(add-attribute ds__1119__auto__ attributes__1120__auto__)

attribute-typesclj

Mapping of Weka's attribute types from clj-ml-dev keywords to the -T flag's representation.

Mapping of Weka's attribute types from clj-ml-dev keywords to the -T flag's representation.
raw docstring

clj-batchclj

(clj-batch ds__1119__auto__)
(clj-batch ds__1119__auto__ attributes__1120__auto__)

clj-streamableclj

(clj-streamable ds__1119__auto__)
(clj-streamable ds__1119__auto__ attributes__1120__auto__)

deffiltercljmacro

(deffilter filter-name)

Defines the filter's fn that creates a fn to make and apply the filter.

Defines the filter's fn that creates a fn to make and apply the filter.
raw docstring

filter-aliasesclj

Mapping of cjl-ml keywords to actual Weka classes

Mapping of cjl-ml keywords to actual Weka classes
raw docstring

filter-applyclj

(filter-apply filter dataset)

Filters an input dataset using the provided filter and generates an output dataset. The first argument is a filter and the second parameter the data set where the filter should be applied.

Filters an input dataset using the provided filter and generates an output dataset. The
first argument is a filter and the second parameter the data set where the filter should
be applied.
raw docstring

make-apply-filterclj

(make-apply-filter kind options dataset)

Creates a new filter with the provided options and apply it to the provided dataset. The :dataset-format attribute for the making of the filter will be setup to the dataset passed as an argument if no other value is provided.

The application of this filter is equivalent to the consecutive application of make-filter and apply-filter.

Creates a new filter with the provided options and apply it to the provided dataset.
The :dataset-format attribute for the making of the filter will be setup to the
dataset passed as an argument if no other value is provided.

The application of this filter is equivalent to the consecutive application of
make-filter and apply-filter.
raw docstring

make-apply-filtersclj

(make-apply-filters filter-options dataset)

Creates new filters with the provided options and applies them to the provided dataset. The :dataset-format attribute for the making of the filter will be setup to the dataset passed as an argument if no other value is provided.

Creates new filters with the provided options and applies them to the provided dataset.
The :dataset-format attribute for the making of the filter will be setup to the
dataset passed as an argument if no other value is provided.
raw docstring

make-filterclj

(make-filter kind options)

Creates a filter for the provided attributes format. The first argument must be a symbol identifying the kind of filter to generate. Currently the following filters are supported:

  • :supervised-discretize
  • :unsupervised-discretize
  • :supervised-nominal-to-binary
  • :unsupervised-nominal-to-binary
  • :numeric-to-nominal
  • :add-attribute
  • :remove-attributes
  • :remove-percentage
  • :remove-range
  • :remove-useless-attributes
  • :select-append-attributes
  • :project-attributes
  • :clj-streamable
  • :clj-batch

The second parameter is a map of attributes for the filter. All filters require a :dataset-format parameter:

 - :dataset-format
     The dataset where the filter is going to be applied or a
     description of the format of its attributes. Sample value:
     dataset, (dataset-format dataset)

An example of usage:

(make-filter :remove {:attributes [0 1] :dataset-format dataset})

Documentation for the different filters:

  • :supervised-discretize

    An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the default).

    Parameters:

    • :attributes Index of the attributes to be discretized, sample value: [0,4,6] The attributes may also be specified by names as well: [:some-name, "another-name"]
    • :invert Invert mathcing sense of the columns, sample value: true
    • :kononenko Use Kononenko's MDL criterion, sample value: true
  • :unsupervised-discretize

    Unsupervised version of the discretize filter. Discretization is by simple pinning.

    Parameters:

    • :attributes Index of the attributes to be discretized, sample value: [0,4,6] The attributes may also be specified by names as well: [:some-name, "another-name"]
    • :unset-class Does not take class attribute into account for the application of the filter, sample-value: true
    • :binary
    • :equal-frequency Use equal frequency instead of equal width discretization, sample value: true
    • :optimize Optmize the number of bins using leave-one-out estimate of estimated entropy. Ingores the :binary attribute. sample value: true
    • :number-bins Defines the number of bins to divide the numeric attributes into sample value: 3
  • :supervised-nominal-to-binary

    Converts nominal attributes into binary numeric attributes. An attribute with k values is transformed into k binary attributes if the class is nominal.

    Parameters:

    • :also-binary Sets if binary attributes are to be coded as nominal ones, sample value: true
    • :for-each-nominal For each nominal value one binary attribute is created, not only if the values of the nominal attribute are greater than two.
  • :unsupervised-nominal-to-binary

    Unsupervised version of the :nominal-to-binary filter

    Parameters:

    • :attributes Index of the attributes to be binarized. Sample value: [0 1 2] The attributes may also be specified by names as well: [:some-name, "another-name"]
    • :also-binary Sets if binary attributes are to be coded as nominal ones, sample value: true
    • :for-each-nominal For each nominal value one binary attribute is created, not only if the values of the nominal attribute are greater than two., sample value: true
  • :numeric-to-nominal

    Transforms numeric attributes into nominal ones.

    Parameters:

    • :attributes Index of the attributes to be transformed. Sample value: [0 1 2] The attributes may also be specified by names as well: [:some-name, "another-name"]
    • :invert Invert the selection of the columns. Sample value: true
  • :add-attribute

    Adds a new attribute to the dataset. The new attribute will contain all missing values.

    Parameters:

    • :type Type of the new attribute. Valid options: :numeric, :nominal, :string, :date. Defaults to :numeric.
    • :name Name of the new attribute.
    • :column Index of where to insert the attribute, indexed by 0. You may also pass in "first" and "last". Sample values: "first", 0, 1, "last" The default is: "last"
    • :labels Vector of valid nominal values. This only applies when the type is :nominal.
    • :format The format of the date values (see ISO-8601). This only applies when the type is :date. The default is: "yyyy-MM-dd'T'HH:mm:ss"
  • :remove-attributes

    Remove some columns from the data set after the provided attributes.

    Parameters:

    • :attributes Index of the attributes to remove. Sample value: [0 1 2] The attributes may also be specified by names as well: [:some-name, "another-name"]
  • :remove-useless-attributes

    Remove attributes that do not vary at all or that vary too much. All constant attributes are deleted automatically, along with any that exceed the maximum percentage of variance parameter. The maximum variance test is only applied to nominal attributes.

Parameters:

 - :max-variance
     Maximum variance percentage allowed (default 99).
     Note: percentage, not decimal. e.g. 89 not 0.89
     If you pass in a decimal Weka silently sets it to 0.0.
  • :select-append-attributes

    Append a copy of the selected columns at the end of the dataset.

    Parameters:

    • :attributes Index of the attributes. Sample value: [1 2 3] The attributes may also be specified by names as well: [:some-name, "another-name"]
    • :invert Invert the selection of the columns. Sample value: true
  • :project-attributes

    Project some columns from the provided dataset

    Parameters:

    • :invert Invert the selection of columns. Sample value: true
    • :clj-streamable

    Allows you to create a custom streamable filter with clojure functions. A streamable filter is appropriate when you don't need to iterate over the entire dataset before processing it.

    Parameters:

    • :process This function will receive individual weka.core.Instance objects (rows of the dataset) and should return a newly processed Instance. The actual Instance is passed in and you may change it directly. However, a better approach is to copy the Instance with the copy method or Instance constructor and return a modified version of the copy.
    • :determine-dataset-format This function will receive the dataset's weka.core.Instances object with no actual Instance objects (i.e. just the format enocded in the attributes). You must return a Instances object that contains the new format of the filtered dataset. Passing this fn is optional. If you are not changing the format of the dataset then by omitting a function will use the current format.
    • :clj-batch

    Allows you to create a custom batch filter with clojure functions. A batch filter is appropriate when you need to iterate over the entire dataset before processing it.

    Parameters:

    • :process This function will receive the entire dataset as a weka.core.Instances objects. A processed Instances object should be returned with the new Instance objects added to it. The format of the dataset (Instances) that is returned from this will be returned from the filter (see below).
    • :determine-dataset-format This function will receive the dataset's weka.core.Instances object with no actual Instance objects (i.e. just the format enocded in the attributes). You must return a Instances object that contains the new format of the filtered dataset. Passing this fn is optional. For many batch filters you need to process the entire dataset to determine the correct format (e.g. filters that operate on nominal attributes). For this reason the clj-batch filter will always use format of the dataset that the process fn outputs. In other words, if you need to operate on the entire dataset before determining the format then this should be done in the process-fn and nothing needs to be passed for this fn.

For examples on how to use the filters, especially the clojure filters, you may refer to filters_test.clj of clj-ml-dev.

Creates a filter for the provided attributes format. The first argument must be a symbol
identifying the kind of filter to generate.
Currently the following filters are supported:

  - :supervised-discretize
  - :unsupervised-discretize
  - :supervised-nominal-to-binary
  - :unsupervised-nominal-to-binary
  - :numeric-to-nominal
  - :add-attribute
  - :remove-attributes
  - :remove-percentage
  - :remove-range
  - :remove-useless-attributes
  - :select-append-attributes
  - :project-attributes
  - :clj-streamable
  - :clj-batch

 The second parameter is a map of attributes for the filter.
 All filters require a :dataset-format parameter:

     - :dataset-format
         The dataset where the filter is going to be applied or a
         description of the format of its attributes. Sample value:
         dataset, (dataset-format dataset)

 An example of usage:

   (make-filter :remove {:attributes [0 1] :dataset-format dataset})

 Documentation for the different filters:

 * :supervised-discretize

   An instance filter that discretizes a range of numeric attributes
   in the dataset into nominal attributes. Discretization is by Fayyad
   & Irani's MDL method (the default).

   Parameters:

     - :attributes
         Index of the attributes to be discretized, sample value: [0,4,6]
         The attributes may also be specified by names as well: [:some-name, "another-name"]
     - :invert
         Invert mathcing sense of the columns, sample value: true
     - :kononenko
         Use Kononenko's MDL criterion, sample value: true

 * :unsupervised-discretize

   Unsupervised version of the discretize filter. Discretization is by simple
   pinning.

   Parameters:

     - :attributes
         Index of the attributes to be discretized, sample value: [0,4,6]
         The attributes may also be specified by names as well: [:some-name, "another-name"]
     - :unset-class
         Does not take class attribute into account for the application
         of the filter, sample-value: true
     - :binary
     - :equal-frequency
         Use equal frequency instead of equal width discretization, sample
         value: true
     - :optimize
         Optmize the number of bins using leave-one-out estimate of
         estimated entropy. Ingores the :binary attribute. sample value: true
     - :number-bins
         Defines the number of bins to divide the numeric attributes into
         sample value: 3

 * :supervised-nominal-to-binary

   Converts nominal attributes into binary numeric attributes. An attribute with k values
   is transformed into k binary attributes if the class is nominal.

   Parameters:
     - :also-binary
         Sets if binary attributes are to be coded as nominal ones, sample value: true
     - :for-each-nominal
         For each nominal value one binary attribute is created, not only if the
         values of the nominal attribute are greater than two.

 * :unsupervised-nominal-to-binary

   Unsupervised version of the :nominal-to-binary filter

   Parameters:

     - :attributes
         Index of the attributes to be binarized. Sample value: [0 1 2]
         The attributes may also be specified by names as well: [:some-name, "another-name"]
     - :also-binary
         Sets if binary attributes are to be coded as nominal ones, sample value: true
     - :for-each-nominal
         For each nominal value one binary attribute is created, not only if the
         values of the nominal attribute are greater than two., sample value: true

 * :numeric-to-nominal

   Transforms numeric attributes into nominal ones.

   Parameters:

     - :attributes
         Index of the attributes to be transformed. Sample value: [0 1 2]
         The attributes may also be specified by names as well: [:some-name, "another-name"]
     - :invert
         Invert the selection of the columns. Sample value: true

 * :add-attribute

   Adds a new attribute to the dataset. The new attribute will contain all missing values.

   Parameters:

     - :type
         Type of the new attribute. Valid options: :numeric, :nominal, :string, :date. Defaults to :numeric.
     - :name
         Name of the new attribute.
     - :column
         Index of where to insert the attribute, indexed by 0. You may also pass in "first" and "last".
         Sample values: "first", 0, 1, "last"
         The default is: "last"
     - :labels
         Vector of valid nominal values. This only applies when the type is :nominal.
     - :format
         The format of the date values (see ISO-8601).  This only applies when the type is :date.
         The default is: "yyyy-MM-dd'T'HH:mm:ss"

 * :remove-attributes

   Remove some columns from the data set after the provided attributes.

   Parameters:

     - :attributes
         Index of the attributes to remove. Sample value: [0 1 2]
         The attributes may also be specified by names as well: [:some-name, "another-name"]

 * :remove-useless-attributes

    Remove attributes that do not vary at all or that vary too much. All constant
    attributes are deleted automatically, along with any that exceed the maximum percentage
    of variance parameter. The maximum variance test is only applied to nominal attributes.

  Parameters:

     - :max-variance
         Maximum variance percentage allowed (default 99).
         Note: percentage, not decimal. e.g. 89 not 0.89
         If you pass in a decimal Weka silently sets it to 0.0.

 * :select-append-attributes

   Append a copy of the selected columns at the end of the dataset.

   Parameters:

     - :attributes
         Index of the attributes. Sample value: [1 2 3]
         The attributes may also be specified by names as well: [:some-name, "another-name"]
     - :invert
         Invert the selection of the columns. Sample value: true

 * :project-attributes

   Project some columns from the provided dataset

   Parameters:

     - :invert
         Invert the selection of columns. Sample value: true

   * :clj-streamable

   Allows you to create a custom streamable filter with clojure functions.
   A streamable filter is appropriate when you don't need to iterate over
   the entire dataset before processing it.

   Parameters:

     - :process
         This function will receive individual weka.core.Instance objects (rows
         of the dataset) and should return a newly processed Instance. The
         actual Instance is passed in and you may change it directly. However, a better
         approach is to copy the Instance with the copy method or Instance
         constructor and return a modified version of the copy.
     - :determine-dataset-format
         This function will receive the dataset's weka.core.Instances object with
         no actual Instance objects (i.e. just the format enocded in the attributes).
         You must return a Instances object that contains the new format of the
         filtered dataset.  Passing this fn is optional.  If you are not changing
         the format of the dataset then by omitting a function will use the
         current format.

   * :clj-batch

   Allows you to create a custom batch filter with clojure functions.
   A batch filter is appropriate when you need to iterate over
   the entire dataset before processing it.

   Parameters:

     - :process
         This function will receive the entire dataset as a weka.core.Instances
         objects.  A processed Instances object should be returned with the
         new Instance objects added to it.  The format of the dataset (Instances)
         that is returned from this will be returned from the filter (see below).
     - :determine-dataset-format
         This function will receive the dataset's weka.core.Instances object with
         no actual Instance objects (i.e. just the format enocded in the attributes).
         You must return a Instances object that contains the new format of the
         filtered dataset.  Passing this fn is optional.
         For many batch filters you need to process the entire dataset to determine
         the correct format (e.g. filters that operate on nominal attributes). For
         this reason the clj-batch filter will *always* use format of the dataset
         that the process fn outputs.  In other words, if you need to operate on the
         entire dataset before determining the format then this should be done in the
         process-fn and nothing needs to be passed for this fn.

For examples on how to use the filters, especially the clojure filters, you may
refer to filters_test.clj of clj-ml-dev.
raw docstring

numeric-to-nominalclj

(numeric-to-nominal ds__1119__auto__)
(numeric-to-nominal ds__1119__auto__ attributes__1120__auto__)

project-attributesclj

(project-attributes ds__1119__auto__)
(project-attributes ds__1119__auto__ attributes__1120__auto__)

remove-attributesclj

(remove-attributes ds__1119__auto__)
(remove-attributes ds__1119__auto__ attributes__1120__auto__)

remove-percentageclj

(remove-percentage ds__1119__auto__)
(remove-percentage ds__1119__auto__ attributes__1120__auto__)

remove-rangeclj

(remove-range ds__1119__auto__)
(remove-range ds__1119__auto__ attributes__1120__auto__)

remove-useless-attributesclj

(remove-useless-attributes ds__1119__auto__)
(remove-useless-attributes ds__1119__auto__ attributes__1120__auto__)

select-append-attributesclj

(select-append-attributes ds__1119__auto__)
(select-append-attributes ds__1119__auto__ attributes__1120__auto__)

supervised-discretizeclj

(supervised-discretize ds__1119__auto__)
(supervised-discretize ds__1119__auto__ attributes__1120__auto__)

supervised-nominal-to-binaryclj

(supervised-nominal-to-binary ds__1119__auto__)
(supervised-nominal-to-binary ds__1119__auto__ attributes__1120__auto__)

unsupervised-discretizeclj

(unsupervised-discretize ds__1119__auto__)
(unsupervised-discretize ds__1119__auto__ attributes__1120__auto__)

unsupervised-nominal-to-binaryclj

(unsupervised-nominal-to-binary ds__1119__auto__)
(unsupervised-nominal-to-binary ds__1119__auto__ attributes__1120__auto__)

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close