Liking cljdoc? Tell your friends :D

datajure.util

Data cleaning and exploration utilities for datajure. Standalone functions that operate on datasets directly and thread naturally. Not part of dt — these complement it for common data preparation tasks.

Data cleaning and exploration utilities for datajure.
Standalone functions that operate on datasets directly and thread naturally.
Not part of dt — these complement it for common data preparation tasks.
raw docstring

clean-column-namesclj

(clean-column-names dataset)

Clean column names: lowercase, replace spaces/special chars with hyphens, collapse consecutive hyphens, strip leading/trailing hyphens. "Some Ugly Name!" → :some-ugly-name

Clean column names: lowercase, replace spaces/special chars with hyphens,
collapse consecutive hyphens, strip leading/trailing hyphens.
"Some Ugly Name!" → :some-ugly-name
sourceraw docstring

coerce-columnsclj

(coerce-columns dataset col-type-map)

Bulk type coercion. col-type-map is {col-kw datatype-kw ...}. Example: (coerce-columns ds {:year :int64 :mass :float64})

Bulk type coercion. col-type-map is {col-kw datatype-kw ...}.
Example: (coerce-columns ds {:year :int64 :mass :float64})
sourceraw docstring

describeclj

(describe dataset)
(describe dataset cols)

Descriptive statistics for dataset columns. Returns a dataset with one row per column: :column, :datatype, :n, :n-missing, :mean, :sd, :min, :p25, :median, :p75, :max. Non-numeric columns show nil for stats. Optional second arg selects columns (vector of keywords).

Descriptive statistics for dataset columns. Returns a dataset with one row
per column: :column, :datatype, :n, :n-missing, :mean, :sd, :min, :p25,
:median, :p75, :max. Non-numeric columns show nil for stats.
Optional second arg selects columns (vector of keywords).
sourceraw docstring

drop-constant-columnsclj

(drop-constant-columns dataset)

Remove columns where all values are identical (zero variance). Note: columns with 0 or 1 rows are always kept — a single observation has no variance by definition, but that does not mean the column is constant across observations.

Remove columns where all values are identical (zero variance).
Note: columns with 0 or 1 rows are always kept — a single observation has no
variance by definition, but that does not mean the column is constant across
observations.
sourceraw docstring

duplicate-rowsclj

(duplicate-rows dataset)
(duplicate-rows dataset cols)

Returns dataset of duplicate rows only. Optional second arg specifies subset of columns to check for duplicates.

Returns dataset of duplicate rows only. Optional second arg specifies
subset of columns to check for duplicates.
sourceraw docstring

mark-duplicatesclj

(mark-duplicates dataset)
(mark-duplicates dataset cols)

Adds :duplicate? boolean column. Optional second arg specifies subset of columns to check for duplicates.

Adds :duplicate? boolean column. Optional second arg specifies
subset of columns to check for duplicates.
sourceraw docstring

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts
Ctrl+kJump to recent docs
Move to previous article
Move to next article
Ctrl+/Jump to the search field
× close