Liking cljdoc? Tell your friends :D

clj-ml.data

This namespace contains several functions for building creating and manipulating data sets and instances. The formats of these data sets as well as their classes can be modified and assigned to the instances. Finally data sets can be transformed into Clojure sequences that can be transformed using usual Clojure functions like map, reduce, etc.

This namespace contains several functions for
building creating and manipulating data sets and instances. The formats of
these data sets as well as their classes can be modified and assigned to
the instances. Finally data sets can be transformed into Clojure sequences
that can be transformed using usual Clojure functions like map, reduce, etc.
raw docstring

attr-nameclj

(attr-name attr)
source

attribute-atclj

(attribute-at dataset-or-instance index-or-name)

Returns attribute situated at the provided position or the provided name.

Returns attribute situated at the provided position or the provided name.
sourceraw docstring

attribute-labelsclj

(attribute-labels attr)

Returns the labels (possible values) for the given nominal attribute as keywords.

Returns the labels (possible values) for the given nominal attribute as keywords.
sourceraw docstring

attribute-labels-as-stringsclj

(attribute-labels-as-strings attr)

Returns the labels (possible values) for the given nominal attribute as strings.

Returns the labels (possible values) for the given nominal attribute as strings.
sourceraw docstring

attribute-labels-indexesclj

(attribute-labels-indexes attr)

Returns map of the labels (possible values) for the given nominal attribute as the keys with the values being the attributes index.

Returns map of the labels (possible values) for the given nominal attribute as the keys
with the values being the attributes index. 
sourceraw docstring

attribute-name-atclj

(attribute-name-at dataset-or-instance index-or-name)

Returns the name of an attribute situated at the provided position in the attributes definition of an instance or class

Returns the name of an attribute situated at the provided position in
the attributes definition of an instance or class
sourceraw docstring

attribute-namesclj

(attribute-names dataset-or-instance)

Returns the attribute names, as keywords, of the dataset or instance

Returns the attribute names, as keywords, of the dataset or instance
sourceraw docstring

attribute-value-fnclj

(attribute-value-fn ds attr-name)

Takes a dataset and an attribute name, returns a function that will select the attribute value of a given instance from the dataset.

Takes a dataset and an attribute name, returns a function that will select the attribute value
of a given instance from the dataset.
sourceraw docstring

attributesclj

(attributes dataset-or-instance)

Returns the attributes (weka.core.Attribute) of the dataset or instance

Returns the attributes (weka.core.Attribute) of the dataset or instance
sourceraw docstring

copy-datasetclj

(copy-dataset ds)

Uses the Instances constructor to copy a given dataset. Each Instance (row) will be shallow copied. So, while not all the data is copied you will be creating n new Instance objects, where n is the number of training examples.

Uses the Instances constructor to copy a given dataset.  Each Instance (row) will be shallow copied. So, while
not all the data is copied you will be creating n new Instance objects, where n is the number of training examples.
sourceraw docstring

dataset-addclj

(dataset-add dataset vector)
(dataset-add dataset weight vector)

Adds a new instance to a dataset. A clojure vector, map, or an Instance can be passed as arguments

Adds a new instance to a dataset. A clojure vector, map, or an Instance
can be passed as arguments
sourceraw docstring

dataset-append-nameclj

(dataset-append-name dataset name-addition)

Sets the dataset's name

Sets the dataset's name
sourceraw docstring

dataset-as-listsclj

(dataset-as-lists dataset)

Returns a lazy sequence of the dataset represented as lists. The values are the actual values (i.e. the string values) and not weka's internal double representation or clj-ml's keyword representation.

Returns a lazy sequence of the dataset represented as lists.  The values
are the actual values (i.e. the string values) and not weka's internal
double representation or clj-ml's keyword representation.
sourceraw docstring

dataset-as-mapsclj

(dataset-as-maps dataset)

Returns a lazy sequence of the dataset represetned as maps. This fn is preferale to mapping over a seq yourself with instance-to-map becuase it avoids redundant string interning of the attribute names.

Returns a lazy sequence of the dataset represetned as maps.
This fn is preferale to mapping over a seq yourself with instance-to-map
becuase it avoids redundant string interning of the attribute names.
sourceraw docstring

dataset-as-vecsclj

(dataset-as-vecs dataset)

Returns a lazy sequence of the dataset represented as lists. The values are the actual values (i.e. the string values) and not weka's internal double representation or clj-ml's keyword representation.

Returns a lazy sequence of the dataset represented as lists.  The values
are the actual values (i.e. the string values) and not weka's internal
double representation or clj-ml's keyword representation.
sourceraw docstring

dataset-atclj

(dataset-at dataset pos)

Returns the instance at a certain position from the dataset

Returns the instance at a certain position from the dataset
sourceraw docstring

dataset-attribute-atclj

(dataset-attribute-at dataset index-or-name)
source

dataset-attributesclj

(dataset-attributes dataset)

Returns the attributes (weka.core.Attribute) of the dataset or instance

Returns the attributes (weka.core.Attribute) of the dataset or instance
sourceraw docstring

dataset-class-indexclj

(dataset-class-index dataset)

Returns the index of the class attribute for this dataset

Returns the index of the class attribute for this dataset
sourceraw docstring

dataset-class-labelsclj

(dataset-class-labels dataset)

Returns the possible labels for the class attribute

Returns the possible labels for the class attribute
sourceraw docstring

dataset-class-nameclj

(dataset-class-name dataset)

Returns the name of the class attribute in keyword form. Returns nil if not set.

Returns the name of the class attribute in keyword form.  Returns nil if not set.
sourceraw docstring

dataset-class-valuesclj

(dataset-class-values dataset)

Returns a lazy-seq of the values for the dataset's class attribute. If the class is nominal then the string value (not keyword) is returned.

Returns a lazy-seq of the values for the dataset's class attribute.
If the class is nominal then the string value (not keyword) is returned.
sourceraw docstring

dataset-countclj

(dataset-count dataset)

Returns the number of elements in a dataset

Returns the number of elements in a dataset
sourceraw docstring

dataset-extract-atclj

(dataset-extract-at dataset pos)

Removes and returns the instance at a certain position from the dataset

Removes and returns the instance at a certain position from the dataset
sourceraw docstring

dataset-filenameclj

(dataset-filename model-prefix model-dir tag)
source

dataset-formatclj

(dataset-format dataset)

Returns the definition of the attributes of this dataset

Returns the definition of the attributes of this dataset
sourceraw docstring

dataset-index-attrclj

(dataset-index-attr dataset attr)

Returns the index of an attribute in the attributes definition of a dataset.

Returns the index of an attribute in the attributes definition of a dataset.
sourceraw docstring

dataset-labels-atclj

(dataset-labels-at dataset-or-instance index-or-name)
source

dataset-nameclj

(dataset-name dataset)

Returns the name of this dataset

Returns the name of this dataset
sourceraw docstring

dataset-nominal?clj

(dataset-nominal? dataset)

Returns boolean indicating if the class attribute is nominal

Returns boolean indicating if the class attribute is nominal
sourceraw docstring

dataset-popclj

(dataset-pop dataset)

Removes and returns the first instance in the dataset

Removes and returns the first instance in the dataset
sourceraw docstring

dataset-remove-attribute-atclj

(dataset-remove-attribute-at dataset index)

Removes the attribute at the specified index

Removes the attribute at the specified index
sourceraw docstring

dataset-remove-classclj

(dataset-remove-class dataset)

Removes the class attribute from the dataset

Removes the class attribute from the dataset
sourceraw docstring

dataset-replace-attribute!clj

(dataset-replace-attribute! dataset attr-name new-attr)

Replaces the specified attribute with the given one. (The attribute should be a weka.core.Attribute) This function only modifies the format of the dataset and does not deal with any instances. The intention is for this to be used on data-formats and not on datasets with data.

Replaces the specified attribute with the given one. (The attribute should be a weka.core.Attribute)
This function only modifies the format of the dataset and does not deal with any instances.
The intention is for this to be used on data-formats and not on datasets with data.
sourceraw docstring

dataset-seqclj

(dataset-seq dataset)

Builds a new clojure sequence from this dataset

Builds a new clojure sequence from this dataset
sourceraw docstring

dataset-set-classclj

(dataset-set-class dataset index-or-name)

Sets the index of the attribute of the dataset that is the class of the dataset

Sets the index of the attribute of the dataset that is the class of the dataset
sourceraw docstring

dataset-set-nameclj

(dataset-set-name dataset new-name)

Sets the dataset's name

Sets the dataset's name
sourceraw docstring

dataset-weightsclj

(dataset-weights dataset)

Returns a lazy-seq of the weights of the dataset instances.

Returns a lazy-seq of the weights of the dataset instances.
sourceraw docstring

do-split-datasetclj

(do-split-dataset ds & options)

The same as split-dataset but actual datasets are returned and not Delay objects that need dereffing.

The same as split-dataset but actual datasets are returned and not Delay objects that need dereffing.
sourceraw docstring

docs-to-datasetclj

(docs-to-dataset docs model-prefix model-dir & opts)

Docs are expected to be maps with this structure: {:id [any], :has-class? [true/false], :title [string], :fulltext [string]}. Of course, title or fulltext could be nil. model-prefix is a filename prefix to saving/loading the model (necessary to initialize the string-to-wordvec filters), and model-dir is a folder to save/load the model.

opts are optional parameters: :keep-n [int], :lowercase [true/false], :words-to-keep [int], :normalize [int], :transform-tf [true/false], :transform-idf [true/false], :stemmer [true/false], :resample [true/false], :training [true/false], :testing [true/false].

A map is returned with structure {:dataset [the dataset], :docids [seq of docids as ordered in dataset]}.

Docs are expected to be maps with this structure: {:id
[any], :has-class? [true/false], :title [string], :fulltext
[string]}. Of course, title or fulltext could be nil. model-prefix
is a filename prefix to saving/loading the model (necessary to
initialize the string-to-wordvec filters), and model-dir is a folder
to save/load the model.

opts are optional parameters: :keep-n [int], :lowercase
[true/false], :words-to-keep [int], :normalize [int], :transform-tf
[true/false], :transform-idf [true/false], :stemmer
[true/false], :resample [true/false], :training
[true/false], :testing [true/false].

A map is returned with structure {:dataset [the dataset], :docids
[seq of docids as ordered in dataset]}.
sourceraw docstring

enumeration-or-nil-seqclj

(enumeration-or-nil-seq s)
source

headers-onlyclj

(headers-only ds)

Returns a new weka dataset (Instances) with the same headers as the given one

Returns a new weka dataset (Instances) with the same headers as the given one
sourceraw docstring

instance-attribute-atclj

(instance-attribute-at instance index-or-name)
source

instance-attributesclj

(instance-attributes instance)

Returns the attributes (weka.core.Attribute) of the dataset or instance

Returns the attributes (weka.core.Attribute) of the dataset or instance
sourceraw docstring

instance-get-classclj

(instance-get-class instance)

Get the class attribute for this instance; returns nil if the class is "missing"

Get the class attribute for this instance; returns nil if the class is "missing"
sourceraw docstring

instance-index-attrclj

(instance-index-attr instance attr)

Returns the index of an attribute in the attributes definition of an instance or dataset

Returns the index of an attribute in the attributes definition of an
instance or dataset
sourceraw docstring

instance-set-classclj

(instance-set-class instance val)

Sets the value (label) of the class attribute for this instance

Sets the value (label) of the class attribute for this instance
sourceraw docstring

instance-set-class-missingclj

(instance-set-class-missing instance)

Sets the class to "missing"

Sets the class to "missing"
sourceraw docstring

instance-to-listclj

(instance-to-list instance)

Builds a list with the values of the instance

Builds a list with the values of the instance
sourceraw docstring

instance-to-mapclj

(instance-to-map instance)

Builds a vector with the values of the instance

Builds a vector with the values of the instance
sourceraw docstring

instance-to-vectorclj

(instance-to-vector instance)

Builds a vector with the values of the instance

Builds a vector with the values of the instance
sourceraw docstring

instance-value-atclj

(instance-value-at instance pos)

Returns the value of an instance attribute. A string, not a keyword is returned.

Returns the value of an instance attribute. A string, not a keyword is returned.
sourceraw docstring

is-dataset?clj

(is-dataset? dataset)

Checks if the provided object is a dataset

Checks if the provided object is a dataset
sourceraw docstring

is-instance?clj

(is-instance? instance)

Checks if the provided object is an instance

Checks if the provided object is an instance
sourceraw docstring

keyword-nameclj

(keyword-name attr)
source

make-datasetclj

(make-dataset ds-name attributes capacity-or-labels & opts)

Creates a new dataset, empty or with the provided instances and options

Creates a new dataset, empty or with the provided instances and options
sourceraw docstring

make-instanceclj

(make-instance dataset vector)
(make-instance dataset weight vector)

Creates a new dataset instance from a vector

Creates a new dataset instance from a vector
sourceraw docstring

make-sparse-datasetclj

(make-sparse-dataset ds-name attributes capacity-or-labels & opts)

Creates a new dataset, empty or with the provided instances and options

Creates a new dataset, empty or with the provided instances and options
sourceraw docstring

make-sparse-instanceclj

(make-sparse-instance dataset valmap)
(make-sparse-instance dataset weight valmap)

Creates a new dataset instance from a map of index-value pairs (as a clojure map), where index starts at 0. Use explicit Double/NaN for missing values; all other values are assumed to be zeros.

Creates a new dataset instance from a map of index-value pairs (as
a clojure map), where index starts at 0. Use explicit Double/NaN for
missing values; all other values are assumed to be zeros.
sourceraw docstring

nominal-attributeclj

(nominal-attribute attr-name labels)

Creates a nominal weka.core.Attribute with the given name and labels

Creates a nominal weka.core.Attribute with the given name and labels
sourceraw docstring

nominal-attributesclj

(nominal-attributes dataset-or-instance)

Returns the string attributes (weka.core.Attribute) of the dataset or instance

Returns the string attributes (weka.core.Attribute) of the dataset or instance
sourceraw docstring

numeric-attributesclj

(numeric-attributes dataset-or-instance)

Returns the numeric attributes (weka.core.Attribute) of the dataset or instance

Returns the numeric attributes (weka.core.Attribute) of the dataset or instance
sourceraw docstring

randomize-datasetclj

(randomize-dataset ds)
(randomize-dataset ds seed)

Copies the given dataset and returns randomized version.

Copies the given dataset and returns randomized version.
sourceraw docstring

randomize-dataset!clj

(randomize-dataset! ds)
(randomize-dataset! ds seed)

Randomizes the dataset in place and returns the dataset. When no seed is provided then a randmon seed is created.

Randomizes the dataset in place and returns the dataset.
When no seed is provided then a randmon seed is created.
sourceraw docstring

split-datasetclj

(split-dataset ds & [& {:keys [percentage num]}])

Splits the dataset into two parts based on either the ':percentage' given or the ':num' of instances. The first dataset returned will have 'percentage ammount of the original dataset and the second has the remaining portion. Both datasets are Delay objects that need to be dereffed. If you want to have the split immediately you can use do-split-dataset.

Splits the dataset into two parts based on either the ':percentage' given or the ':num' of instances.
The first dataset returned will have 'percentage ammount of the original dataset and the second has the
remaining portion. Both datasets are Delay objects that need to be dereffed.  If you want to have the
split immediately you can use do-split-dataset.
sourceraw docstring

string-attributesclj

(string-attributes dataset-or-instance)

Returns the string attributes (weka.core.Attribute) of the dataset or instance

Returns the string attributes (weka.core.Attribute) of the dataset or instance
sourceraw docstring

take-datasetclj

(take-dataset ds num)

Returns a subset of the given dataset containing the first 'num' instances.

Returns a subset of the given dataset containing the first 'num' instances.
sourceraw docstring

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close