This namespace contains several functions for building creating and manipulating data sets and instances. The formats of these data sets as well as their classes can be modified and assigned to the instances. Finally data sets can be transformed into Clojure sequences that can be transformed using usual Clojure functions like map, reduce, etc.
This namespace contains several functions for building creating and manipulating data sets and instances. The formats of these data sets as well as their classes can be modified and assigned to the instances. Finally data sets can be transformed into Clojure sequences that can be transformed using usual Clojure functions like map, reduce, etc.
(attribute-at dataset-or-instance index-or-name)
Returns attribute situated at the provided position or the provided name.
Returns attribute situated at the provided position or the provided name.
(attribute-labels attr)
Returns the labels (possible values) for the given nominal attribute as keywords.
Returns the labels (possible values) for the given nominal attribute as keywords.
(attribute-labels-as-strings attr)
Returns the labels (possible values) for the given nominal attribute as strings.
Returns the labels (possible values) for the given nominal attribute as strings.
(attribute-labels-indexes attr)
Returns map of the labels (possible values) for the given nominal attribute as the keys with the values being the attributes index.
Returns map of the labels (possible values) for the given nominal attribute as the keys with the values being the attributes index.
(attribute-name-at dataset-or-instance index-or-name)
Returns the name of an attribute situated at the provided position in the attributes definition of an instance or class
Returns the name of an attribute situated at the provided position in the attributes definition of an instance or class
(attribute-names dataset-or-instance)
Returns the attribute names, as keywords, of the dataset or instance
Returns the attribute names, as keywords, of the dataset or instance
(attribute-value-fn ds attr-name)
Takes a dataset and an attribute name, returns a function that will select the attribute value of a given instance from the dataset.
Takes a dataset and an attribute name, returns a function that will select the attribute value of a given instance from the dataset.
(attributes dataset-or-instance)
Returns the attributes (weka.core.Attribute) of the dataset or instance
Returns the attributes (weka.core.Attribute) of the dataset or instance
(copy-dataset ds)
Uses the Instances constructor to copy a given dataset. Each Instance (row) will be shallow copied. So, while not all the data is copied you will be creating n new Instance objects, where n is the number of training examples.
Uses the Instances constructor to copy a given dataset. Each Instance (row) will be shallow copied. So, while not all the data is copied you will be creating n new Instance objects, where n is the number of training examples.
(dataset-add dataset vector)
(dataset-add dataset weight vector)
Adds a new instance to a dataset. A clojure vector, map, or an Instance can be passed as arguments
Adds a new instance to a dataset. A clojure vector, map, or an Instance can be passed as arguments
(dataset-append-name dataset name-addition)
Sets the dataset's name
Sets the dataset's name
(dataset-as-lists dataset)
Returns a lazy sequence of the dataset represented as lists. The values are the actual values (i.e. the string values) and not weka's internal double representation or clj-ml's keyword representation.
Returns a lazy sequence of the dataset represented as lists. The values are the actual values (i.e. the string values) and not weka's internal double representation or clj-ml's keyword representation.
(dataset-as-maps dataset)
Returns a lazy sequence of the dataset represetned as maps. This fn is preferale to mapping over a seq yourself with instance-to-map becuase it avoids redundant string interning of the attribute names.
Returns a lazy sequence of the dataset represetned as maps. This fn is preferale to mapping over a seq yourself with instance-to-map becuase it avoids redundant string interning of the attribute names.
(dataset-as-vecs dataset)
Returns a lazy sequence of the dataset represented as lists. The values are the actual values (i.e. the string values) and not weka's internal double representation or clj-ml's keyword representation.
Returns a lazy sequence of the dataset represented as lists. The values are the actual values (i.e. the string values) and not weka's internal double representation or clj-ml's keyword representation.
(dataset-at dataset pos)
Returns the instance at a certain position from the dataset
Returns the instance at a certain position from the dataset
(dataset-attributes dataset)
Returns the attributes (weka.core.Attribute) of the dataset or instance
Returns the attributes (weka.core.Attribute) of the dataset or instance
(dataset-class-index dataset)
Returns the index of the class attribute for this dataset
Returns the index of the class attribute for this dataset
(dataset-class-labels dataset)
Returns the possible labels for the class attribute
Returns the possible labels for the class attribute
(dataset-class-name dataset)
Returns the name of the class attribute in keyword form. Returns nil if not set.
Returns the name of the class attribute in keyword form. Returns nil if not set.
(dataset-class-values dataset)
Returns a lazy-seq of the values for the dataset's class attribute. If the class is nominal then the string value (not keyword) is returned.
Returns a lazy-seq of the values for the dataset's class attribute. If the class is nominal then the string value (not keyword) is returned.
(dataset-count dataset)
Returns the number of elements in a dataset
Returns the number of elements in a dataset
(dataset-extract-at dataset pos)
Removes and returns the instance at a certain position from the dataset
Removes and returns the instance at a certain position from the dataset
(dataset-format dataset)
Returns the definition of the attributes of this dataset
Returns the definition of the attributes of this dataset
(dataset-index-attr dataset attr)
Returns the index of an attribute in the attributes definition of a dataset.
Returns the index of an attribute in the attributes definition of a dataset.
(dataset-name dataset)
Returns the name of this dataset
Returns the name of this dataset
(dataset-nominal? dataset)
Returns boolean indicating if the class attribute is nominal
Returns boolean indicating if the class attribute is nominal
(dataset-pop dataset)
Removes and returns the first instance in the dataset
Removes and returns the first instance in the dataset
(dataset-remove-attribute-at dataset index)
Removes the attribute at the specified index
Removes the attribute at the specified index
(dataset-remove-class dataset)
Removes the class attribute from the dataset
Removes the class attribute from the dataset
(dataset-replace-attribute! dataset attr-name new-attr)
Replaces the specified attribute with the given one. (The attribute should be a weka.core.Attribute) This function only modifies the format of the dataset and does not deal with any instances. The intention is for this to be used on data-formats and not on datasets with data.
Replaces the specified attribute with the given one. (The attribute should be a weka.core.Attribute) This function only modifies the format of the dataset and does not deal with any instances. The intention is for this to be used on data-formats and not on datasets with data.
(dataset-seq dataset)
Builds a new clojure sequence from this dataset
Builds a new clojure sequence from this dataset
(dataset-set-class dataset index-or-name)
Sets the index of the attribute of the dataset that is the class of the dataset
Sets the index of the attribute of the dataset that is the class of the dataset
(dataset-set-name dataset new-name)
Sets the dataset's name
Sets the dataset's name
(dataset-weights dataset)
Returns a lazy-seq of the weights of the dataset instances.
Returns a lazy-seq of the weights of the dataset instances.
(do-split-dataset ds & options)
The same as split-dataset but actual datasets are returned and not Delay objects that need dereffing.
The same as split-dataset but actual datasets are returned and not Delay objects that need dereffing.
(docs-to-dataset docs model-prefix model-dir & opts)
Docs are expected to be maps with this structure: {:id [any], :has-class? [true/false], :title [string], :fulltext [string]}. Of course, title or fulltext could be nil. model-prefix is a filename prefix to saving/loading the model (necessary to initialize the string-to-wordvec filters), and model-dir is a folder to save/load the model.
opts are optional parameters: :keep-n [int], :lowercase [true/false], :words-to-keep [int], :normalize [int], :transform-tf [true/false], :transform-idf [true/false], :stemmer [true/false], :resample [true/false], :training [true/false], :testing [true/false].
A map is returned with structure {:dataset [the dataset], :docids [seq of docids as ordered in dataset]}.
Docs are expected to be maps with this structure: {:id [any], :has-class? [true/false], :title [string], :fulltext [string]}. Of course, title or fulltext could be nil. model-prefix is a filename prefix to saving/loading the model (necessary to initialize the string-to-wordvec filters), and model-dir is a folder to save/load the model. opts are optional parameters: :keep-n [int], :lowercase [true/false], :words-to-keep [int], :normalize [int], :transform-tf [true/false], :transform-idf [true/false], :stemmer [true/false], :resample [true/false], :training [true/false], :testing [true/false]. A map is returned with structure {:dataset [the dataset], :docids [seq of docids as ordered in dataset]}.
(headers-only ds)
Returns a new weka dataset (Instances) with the same headers as the given one
Returns a new weka dataset (Instances) with the same headers as the given one
(instance-attributes instance)
Returns the attributes (weka.core.Attribute) of the dataset or instance
Returns the attributes (weka.core.Attribute) of the dataset or instance
(instance-get-class instance)
Get the class attribute for this instance; returns nil if the class is "missing"
Get the class attribute for this instance; returns nil if the class is "missing"
(instance-index-attr instance attr)
Returns the index of an attribute in the attributes definition of an instance or dataset
Returns the index of an attribute in the attributes definition of an instance or dataset
(instance-set-class instance val)
Sets the value (label) of the class attribute for this instance
Sets the value (label) of the class attribute for this instance
(instance-set-class-missing instance)
Sets the class to "missing"
Sets the class to "missing"
(instance-to-list instance)
Builds a list with the values of the instance
Builds a list with the values of the instance
(instance-to-map instance)
Builds a vector with the values of the instance
Builds a vector with the values of the instance
(instance-to-vector instance)
Builds a vector with the values of the instance
Builds a vector with the values of the instance
(instance-value-at instance pos)
Returns the value of an instance attribute. A string, not a keyword is returned.
Returns the value of an instance attribute. A string, not a keyword is returned.
(is-dataset? dataset)
Checks if the provided object is a dataset
Checks if the provided object is a dataset
(is-instance? instance)
Checks if the provided object is an instance
Checks if the provided object is an instance
(make-dataset ds-name attributes capacity-or-labels & opts)
Creates a new dataset, empty or with the provided instances and options
Creates a new dataset, empty or with the provided instances and options
(make-instance dataset vector)
(make-instance dataset weight vector)
Creates a new dataset instance from a vector
Creates a new dataset instance from a vector
(nominal-attribute attr-name labels)
Creates a nominal weka.core.Attribute with the given name and labels
Creates a nominal weka.core.Attribute with the given name and labels
(nominal-attributes dataset-or-instance)
Returns the string attributes (weka.core.Attribute) of the dataset or instance
Returns the string attributes (weka.core.Attribute) of the dataset or instance
(numeric-attributes dataset-or-instance)
Returns the numeric attributes (weka.core.Attribute) of the dataset or instance
Returns the numeric attributes (weka.core.Attribute) of the dataset or instance
(randomize-dataset ds)
(randomize-dataset ds seed)
Copies the given dataset and returns randomized version.
Copies the given dataset and returns randomized version.
(randomize-dataset! ds)
(randomize-dataset! ds seed)
Randomizes the dataset in place and returns the dataset. When no seed is provided then a randmon seed is created.
Randomizes the dataset in place and returns the dataset. When no seed is provided then a randmon seed is created.
(split-dataset ds & [& {:keys [percentage num]}])
Splits the dataset into two parts based on either the ':percentage' given or the ':num' of instances. The first dataset returned will have 'percentage ammount of the original dataset and the second has the remaining portion. Both datasets are Delay objects that need to be dereffed. If you want to have the split immediately you can use do-split-dataset.
Splits the dataset into two parts based on either the ':percentage' given or the ':num' of instances. The first dataset returned will have 'percentage ammount of the original dataset and the second has the remaining portion. Both datasets are Delay objects that need to be dereffed. If you want to have the split immediately you can use do-split-dataset.
(string-attributes dataset-or-instance)
Returns the string attributes (weka.core.Attribute) of the dataset or instance
Returns the string attributes (weka.core.Attribute) of the dataset or instance
(take-dataset ds num)
Returns a subset of the given dataset containing the first 'num' instances.
Returns a subset of the given dataset containing the first 'num' instances.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close