A useful top level enumeration of functionality for reference is the sklearn top level API doc.
cross validation
provide helpers to separate a dataset into train, test, validate
k-fold cross validation helpers (also stratified, maintaining the same class balance as in the full set)
sklearn
can be verbose and painful here, would be nice to destructure
by position or keys as in a doseq or let for a particular scope defined
by the kfold (macro?)
Manual indexing in particular is painful, a destructuring in a let
style way to:
grid search
from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import GradientBoostingRegressor
X, y = make_regression(n_samples=10, n_targets=3, random_state=1)
MultiOutputRegressor(GradientBoostingRegressor(random_state=0)).fit(X, y).predict(X)
A meta-ensembler that supports boosting/bagging/blending methods as described here
sklearn uses (fit ) and (estimate )
fit
transform
transform
estimate
predict
and predict_proba
(goofy name, class probability)predict
as well.You want parameter search methods to have access to:
These abstractions could be protocol/interface level, data descriptors a la spec
, etc. Ideally
you could pipe any of this data flow description into a DAG:
Can you improve this documentation?Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close