tech.datatype
- all readers are marked as sequential.unroll-column
- Given a column that may container either iterable or scalar data,
unroll it so it only contains scalar data duplicating rows.tech.ml.dataset.column/unique
and especially
`tech.ml.dataset.pipeline/string->number.tech.v2.datatype
namespace has a new function - make-reader - that reifies
a reader of the appropriate type. This allows you to make new columns that have
nontrivial translations and datatypes much easier than before.tech.v2.datatype
namespace has a new function - ->typed-reader - that typecasts the incoming object into a reader of the appropriate datatype.
This means that .read calls will be strongly typed and is useful for building up a set
of typed variables before using make-reader
above.tech.datatype
added a method
to transform a reader into a persistent-vector-like object that derives from
clojure.lang.APersistentVector
and thus gains benefit from the excellent equality
and hash semantics of persistent vectors.columnwise-concat
which is a far simpler version of dplyr's
https://tidyr.tidyverse.org/reference/pivot_longer.html. This is implemented
efficiently in terms of indexed reader concatentation and as such should work
on tables of any size.->>
) then any options must be
passed before the dataset. Same is true for the set of functions that are dataset
first. We will be more strict about this from now on.tech.v2.datatype.bitmap/bitmap-value->bitmap-map
. This is used for
replace-missing type operations.brief
now does not return missing values. Double or float NaN or INF values
from a mapseq result in maps with fewer keys.brief
overrides this
to provide defaults to get more information.unique-by
returns indexes in order.->>
operators.tech.datatype
with upgraded and fewer dependencies.
:missing-nil?
false as an option.brief
function to main namespace so you can get a nice brief description
of your dataset when working from the REPL. This prints out better than
descriptive-stats
.->
versions of sort added so you can sort in -> pathwayscolumn->dataset
- map a transform function over a column and return a new
dataset from the result. It is expected the transform function returns a map.drop-rows
, select-rows
, drop-columns
- more granular select calls.append-columns
- append a list of columns to a dataset. Used with column->dataset.column-labeled-mapseq
- Create a sequence of maps with a :value and :label members.
this flattens the dataset by producing Y maps per row instead of 1 map per row
where the maps themselves are labeled with the value in their :value member. This
is useful to building vega charts.->distinct-by-column
- take the first row where a given key is present. The arrow
form of this indicats the dataset is the first argument.->sort-by
, ->sort-by-column
- Forms of these functions for using in (->)
dataflows.interpolate-loess
- Produce a new column from a given pair of columns using loess
interpolation to create the column. The interpolator is saved as metadata on the
new column.tech.ml.dataset.column/parse-column
- given a string column that failed to parse for
some reason, you can force the system to attempt to parse it using, for instance,
relaxed parsing semantics where failures simply record the failure in metadata.Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close