(ada-delta
{:keys [rho rescale-gradient epsilon wd clip-gradient]
:as opts
:or {rho 0.05 rescale-gradient 1.0 epsilon 1.0E-8 wd 0.0 clip-gradient 0}})
AdaDelta optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/abs/1212.5701
AdaDelta optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/abs/1212.5701
(ada-grad)
(ada-grad {:keys [learning-rate rescale-gradient epsilon wd]
:or {learning-rate 0.05 rescale-gradient 1.0 epsilon 1.0E-7 wd 0.0}})
AdaGrad optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/pdf/1212.5701v1.pdf
AdaGrad optimizer as described in Matthew D. Zeiler, 2012. http://arxiv.org/pdf/1212.5701v1.pdf - learning-rate Step size. - epsilon A small number to make the updating processing stable. Default value is set to 1e-7. - rescale-gradient rescaling factor of gradient. - wd L2 regularization coefficient add to all the weights
(adam)
(adam {:keys [learning-rate beta1 beta2 epsilon decay-factor wd clip-gradient
lr-scheduler]
:or {learning-rate 0.002
beta1 0.9
beta2 0.999
epsilon 1.0E-8
decay-factor (- 1 1.0E-8)
wd 0
clip-gradient 0}})
Adam optimizer as described in [King2014]
[King2014] Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980
Adam optimizer as described in [King2014] [King2014] Diederik Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization, http://arxiv.org/abs/1412.6980 - learning-rate Step size. - beta1 Exponential decay rate for the first moment estimates. - beta2 Exponential decay rate for the second moment estimates. - epsilon - decay-factor - wd L2 regularization coefficient add to all the weights - clip-gradient clip gradient in range [-clip_gradient, clip_gradient] - lr-scheduler The learning rate scheduler
(create-state optimizer index weight)
Create additional optimizer state such as momentum.
Create additional optimizer state such as momentum.
(dcasgd)
(dcasgd
{:keys [learning-rate momentum lambda wd clip-gradient lr-scheduler]
:as opts
:or {learning-rate 0.01 momentum 0.0 lambda 0.04 wd 0.0 clip-gradient 0}})
DCASGD optimizer with momentum and weight regularization. Implementation of paper 'Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning'
DCASGD optimizer with momentum and weight regularization. Implementation of paper 'Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning'
(nag)
(nag {:keys [learning-rate momentum wd clip-gradient lr-scheduler]
:as opts
:or {learning-rate 0.01 momentum 0.0 wd 1.0E-4 clip-gradient 0}})
SGD with nesterov. It is implemented according to https://github.com/torch/optim/blob/master/sgd.lua
SGD with nesterov. It is implemented according to https://github.com/torch/optim/blob/master/sgd.lua
(rms-prop)
(rms-prop {:keys [learning-rate rescale-gradient gamma1 gamma2 wd lr-scheduler
clip-gradient]
:or {learning-rate 0.002
rescale-gradient 1.0
gamma1 0.95
gamma2 0.9
wd 0.0
clip-gradient 0}})
RMSProp optimizer as described in Tieleman & Hinton, 2012. http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013.
RMSProp optimizer as described in Tieleman & Hinton, 2012. http://arxiv.org/pdf/1308.0850v5.pdf Eq(38) - Eq(45) by Alex Graves, 2013. - learningRate Step size. - gamma1 decay factor of moving average for gradient, gradient^^2. - gamma2 momentum factor of moving average for gradient. - rescale-gradient rescaling factor of gradient. - wd L2 regularization coefficient add to all the weights - clip-gradient clip gradient in range [-clip_gradient, clip_gradient] - lr-scheduler The learning rate scheduler
(sgd)
(sgd {:keys [learning-rate momentum wd clip-gradient lr-scheduler]
:as opts
:or {learning-rate 0.01 momentum 0.0 wd 1.0E-4 clip-gradient 0}})
A very simple SGD optimizer with momentum and weight regularization.
A very simple SGD optimizer with momentum and weight regularization.
(sgld)
(sgld {:keys [learning-rate rescale-gradient wd clip-gradient lr-scheduler]
:or {learning-rate 0.01 rescale-gradient 1 wd 1.0E-4 clip-gradient 0}})
Stochastic Langevin Dynamics Updater to sample from a distribution.
Stochastic Langevin Dynamics Updater to sample from a distribution. - learning-rate Step size. - rescale-gradient rescaling factor of gradient. - wd L2 regularization coefficient add to all the weights - clip-gradient Float, clip gradient in range [-clip_gradient, clip_gradient] - lr-scheduler The learning rate scheduler
(update optimizer index weight grad state)
Update the parameters.
Update the parameters. - optimizer - the optimizer - index An unique integer key used to index the parameters - weight weight ndarray - grad grad ndarray - state NDArray or other objects returned by initState The auxiliary state used in optimization.
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close