This namespace defines a set of functions that can be applied to data sets to modify the dataset in some way: transforming nominal attributes into binary attributes, removing attributes etc.
There are a number of ways to use the filtering API. The most straight forward and idomatic clojure way is to use the provided filter fns:
;; ds is the dataset (def ds (make-dataset :test [:a :b {:c [:g :m]}] [ [1 2 :g] [2 3 :m] [4 5 :g]])) (def filtered-ds (-> ds (add-attribute {:type :nominal, :column 1, :name "pet", :labels ["dog" "cat"]}) (remove-attributes {:attributes [:a :c]})))
The above functions rely on lower level fns that create and apply the filters which you may also use if you need more control over the actual filter objects:
(def filter (make-filter :remove-attributes {:dataset-format ds :attributes [:a :c]}))
;; We apply the filter to the original data set and obtain the new one (def filtered-ds (filter-apply filter ds))
The previous sample of code could be rewritten with the make-apply-filter function:
(def filtered-ds (make-apply-filter :remove-attributes {:attributes [:a :c]} ds))
This namespace defines a set of functions that can be applied to data sets to modify the dataset in some way: transforming nominal attributes into binary attributes, removing attributes etc. There are a number of ways to use the filtering API. The most straight forward and idomatic clojure way is to use the provided filter fns: ;; ds is the dataset (def ds (make-dataset :test [:a :b {:c [:g :m]}] [ [1 2 :g] [2 3 :m] [4 5 :g]])) (def filtered-ds (-> ds (add-attribute {:type :nominal, :column 1, :name "pet", :labels ["dog" "cat"]}) (remove-attributes {:attributes [:a :c]}))) The above functions rely on lower level fns that create and apply the filters which you may also use if you need more control over the actual filter objects: (def filter (make-filter :remove-attributes {:dataset-format ds :attributes [:a :c]})) ;; We apply the filter to the original data set and obtain the new one (def filtered-ds (filter-apply filter ds)) The previous sample of code could be rewritten with the make-apply-filter function: (def filtered-ds (make-apply-filter :remove-attributes {:attributes [:a :c]} ds))
(add-attribute ds__1119__auto__)
(add-attribute ds__1119__auto__ attributes__1120__auto__)
Mapping of Weka's attribute types from clj-ml-dev keywords to the -T flag's representation.
Mapping of Weka's attribute types from clj-ml-dev keywords to the -T flag's representation.
(clj-batch ds__1119__auto__)
(clj-batch ds__1119__auto__ attributes__1120__auto__)
(clj-streamable ds__1119__auto__)
(clj-streamable ds__1119__auto__ attributes__1120__auto__)
(deffilter filter-name)
Defines the filter's fn that creates a fn to make and apply the filter.
Defines the filter's fn that creates a fn to make and apply the filter.
Mapping of cjl-ml keywords to actual Weka classes
Mapping of cjl-ml keywords to actual Weka classes
(filter-apply filter dataset)
Filters an input dataset using the provided filter and generates an output dataset. The first argument is a filter and the second parameter the data set where the filter should be applied.
Filters an input dataset using the provided filter and generates an output dataset. The first argument is a filter and the second parameter the data set where the filter should be applied.
(make-apply-filter kind options dataset)
Creates a new filter with the provided options and apply it to the provided dataset. The :dataset-format attribute for the making of the filter will be setup to the dataset passed as an argument if no other value is provided.
The application of this filter is equivalent to the consecutive application of make-filter and apply-filter.
Creates a new filter with the provided options and apply it to the provided dataset. The :dataset-format attribute for the making of the filter will be setup to the dataset passed as an argument if no other value is provided. The application of this filter is equivalent to the consecutive application of make-filter and apply-filter.
(make-apply-filters filter-options dataset)
Creates new filters with the provided options and applies them to the provided dataset. The :dataset-format attribute for the making of the filter will be setup to the dataset passed as an argument if no other value is provided.
Creates new filters with the provided options and applies them to the provided dataset. The :dataset-format attribute for the making of the filter will be setup to the dataset passed as an argument if no other value is provided.
(make-filter kind options)
Creates a filter for the provided attributes format. The first argument must be a symbol identifying the kind of filter to generate. Currently the following filters are supported:
The second parameter is a map of attributes for the filter. All filters require a :dataset-format parameter:
- :dataset-format
The dataset where the filter is going to be applied or a
description of the format of its attributes. Sample value:
dataset, (dataset-format dataset)
An example of usage:
(make-filter :remove {:attributes [0 1] :dataset-format dataset})
Documentation for the different filters:
:supervised-discretize
An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the default).
Parameters:
:unsupervised-discretize
Unsupervised version of the discretize filter. Discretization is by simple pinning.
Parameters:
:supervised-nominal-to-binary
Converts nominal attributes into binary numeric attributes. An attribute with k values is transformed into k binary attributes if the class is nominal.
Parameters:
:unsupervised-nominal-to-binary
Unsupervised version of the :nominal-to-binary filter
Parameters:
:numeric-to-nominal
Transforms numeric attributes into nominal ones.
Parameters:
:add-attribute
Adds a new attribute to the dataset. The new attribute will contain all missing values.
Parameters:
:remove-attributes
Remove some columns from the data set after the provided attributes.
Parameters:
:remove-useless-attributes
Remove attributes that do not vary at all or that vary too much. All constant attributes are deleted automatically, along with any that exceed the maximum percentage of variance parameter. The maximum variance test is only applied to nominal attributes.
Parameters:
- :max-variance
Maximum variance percentage allowed (default 99).
Note: percentage, not decimal. e.g. 89 not 0.89
If you pass in a decimal Weka silently sets it to 0.0.
:select-append-attributes
Append a copy of the selected columns at the end of the dataset.
Parameters:
:project-attributes
Project some columns from the provided dataset
Parameters:
Allows you to create a custom streamable filter with clojure functions. A streamable filter is appropriate when you don't need to iterate over the entire dataset before processing it.
Parameters:
Allows you to create a custom batch filter with clojure functions. A batch filter is appropriate when you need to iterate over the entire dataset before processing it.
Parameters:
For examples on how to use the filters, especially the clojure filters, you may refer to filters_test.clj of clj-ml-dev.
Creates a filter for the provided attributes format. The first argument must be a symbol identifying the kind of filter to generate. Currently the following filters are supported: - :supervised-discretize - :unsupervised-discretize - :supervised-nominal-to-binary - :unsupervised-nominal-to-binary - :numeric-to-nominal - :add-attribute - :remove-attributes - :remove-percentage - :remove-range - :remove-useless-attributes - :select-append-attributes - :project-attributes - :clj-streamable - :clj-batch The second parameter is a map of attributes for the filter. All filters require a :dataset-format parameter: - :dataset-format The dataset where the filter is going to be applied or a description of the format of its attributes. Sample value: dataset, (dataset-format dataset) An example of usage: (make-filter :remove {:attributes [0 1] :dataset-format dataset}) Documentation for the different filters: * :supervised-discretize An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the default). Parameters: - :attributes Index of the attributes to be discretized, sample value: [0,4,6] The attributes may also be specified by names as well: [:some-name, "another-name"] - :invert Invert mathcing sense of the columns, sample value: true - :kononenko Use Kononenko's MDL criterion, sample value: true * :unsupervised-discretize Unsupervised version of the discretize filter. Discretization is by simple pinning. Parameters: - :attributes Index of the attributes to be discretized, sample value: [0,4,6] The attributes may also be specified by names as well: [:some-name, "another-name"] - :unset-class Does not take class attribute into account for the application of the filter, sample-value: true - :binary - :equal-frequency Use equal frequency instead of equal width discretization, sample value: true - :optimize Optmize the number of bins using leave-one-out estimate of estimated entropy. Ingores the :binary attribute. sample value: true - :number-bins Defines the number of bins to divide the numeric attributes into sample value: 3 * :supervised-nominal-to-binary Converts nominal attributes into binary numeric attributes. An attribute with k values is transformed into k binary attributes if the class is nominal. Parameters: - :also-binary Sets if binary attributes are to be coded as nominal ones, sample value: true - :for-each-nominal For each nominal value one binary attribute is created, not only if the values of the nominal attribute are greater than two. * :unsupervised-nominal-to-binary Unsupervised version of the :nominal-to-binary filter Parameters: - :attributes Index of the attributes to be binarized. Sample value: [0 1 2] The attributes may also be specified by names as well: [:some-name, "another-name"] - :also-binary Sets if binary attributes are to be coded as nominal ones, sample value: true - :for-each-nominal For each nominal value one binary attribute is created, not only if the values of the nominal attribute are greater than two., sample value: true * :numeric-to-nominal Transforms numeric attributes into nominal ones. Parameters: - :attributes Index of the attributes to be transformed. Sample value: [0 1 2] The attributes may also be specified by names as well: [:some-name, "another-name"] - :invert Invert the selection of the columns. Sample value: true * :add-attribute Adds a new attribute to the dataset. The new attribute will contain all missing values. Parameters: - :type Type of the new attribute. Valid options: :numeric, :nominal, :string, :date. Defaults to :numeric. - :name Name of the new attribute. - :column Index of where to insert the attribute, indexed by 0. You may also pass in "first" and "last". Sample values: "first", 0, 1, "last" The default is: "last" - :labels Vector of valid nominal values. This only applies when the type is :nominal. - :format The format of the date values (see ISO-8601). This only applies when the type is :date. The default is: "yyyy-MM-dd'T'HH:mm:ss" * :remove-attributes Remove some columns from the data set after the provided attributes. Parameters: - :attributes Index of the attributes to remove. Sample value: [0 1 2] The attributes may also be specified by names as well: [:some-name, "another-name"] * :remove-useless-attributes Remove attributes that do not vary at all or that vary too much. All constant attributes are deleted automatically, along with any that exceed the maximum percentage of variance parameter. The maximum variance test is only applied to nominal attributes. Parameters: - :max-variance Maximum variance percentage allowed (default 99). Note: percentage, not decimal. e.g. 89 not 0.89 If you pass in a decimal Weka silently sets it to 0.0. * :select-append-attributes Append a copy of the selected columns at the end of the dataset. Parameters: - :attributes Index of the attributes. Sample value: [1 2 3] The attributes may also be specified by names as well: [:some-name, "another-name"] - :invert Invert the selection of the columns. Sample value: true * :project-attributes Project some columns from the provided dataset Parameters: - :invert Invert the selection of columns. Sample value: true * :clj-streamable Allows you to create a custom streamable filter with clojure functions. A streamable filter is appropriate when you don't need to iterate over the entire dataset before processing it. Parameters: - :process This function will receive individual weka.core.Instance objects (rows of the dataset) and should return a newly processed Instance. The actual Instance is passed in and you may change it directly. However, a better approach is to copy the Instance with the copy method or Instance constructor and return a modified version of the copy. - :determine-dataset-format This function will receive the dataset's weka.core.Instances object with no actual Instance objects (i.e. just the format enocded in the attributes). You must return a Instances object that contains the new format of the filtered dataset. Passing this fn is optional. If you are not changing the format of the dataset then by omitting a function will use the current format. * :clj-batch Allows you to create a custom batch filter with clojure functions. A batch filter is appropriate when you need to iterate over the entire dataset before processing it. Parameters: - :process This function will receive the entire dataset as a weka.core.Instances objects. A processed Instances object should be returned with the new Instance objects added to it. The format of the dataset (Instances) that is returned from this will be returned from the filter (see below). - :determine-dataset-format This function will receive the dataset's weka.core.Instances object with no actual Instance objects (i.e. just the format enocded in the attributes). You must return a Instances object that contains the new format of the filtered dataset. Passing this fn is optional. For many batch filters you need to process the entire dataset to determine the correct format (e.g. filters that operate on nominal attributes). For this reason the clj-batch filter will *always* use format of the dataset that the process fn outputs. In other words, if you need to operate on the entire dataset before determining the format then this should be done in the process-fn and nothing needs to be passed for this fn. For examples on how to use the filters, especially the clojure filters, you may refer to filters_test.clj of clj-ml-dev.
(numeric-to-nominal ds__1119__auto__)
(numeric-to-nominal ds__1119__auto__ attributes__1120__auto__)
(project-attributes ds__1119__auto__)
(project-attributes ds__1119__auto__ attributes__1120__auto__)
(remove-attributes ds__1119__auto__)
(remove-attributes ds__1119__auto__ attributes__1120__auto__)
(remove-percentage ds__1119__auto__)
(remove-percentage ds__1119__auto__ attributes__1120__auto__)
(remove-range ds__1119__auto__)
(remove-range ds__1119__auto__ attributes__1120__auto__)
(remove-useless-attributes ds__1119__auto__)
(remove-useless-attributes ds__1119__auto__ attributes__1120__auto__)
(select-append-attributes ds__1119__auto__)
(select-append-attributes ds__1119__auto__ attributes__1120__auto__)
(supervised-discretize ds__1119__auto__)
(supervised-discretize ds__1119__auto__ attributes__1120__auto__)
(supervised-nominal-to-binary ds__1119__auto__)
(supervised-nominal-to-binary ds__1119__auto__ attributes__1120__auto__)
(unsupervised-discretize ds__1119__auto__)
(unsupervised-discretize ds__1119__auto__ attributes__1120__auto__)
(unsupervised-nominal-to-binary ds__1119__auto__)
(unsupervised-nominal-to-binary ds__1119__auto__ attributes__1120__auto__)
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close