Liking cljdoc? Tell your friends :D

RNN Cell Symbol API

.. currentmodule:: mxnet.rnn

.. warning:: This package was experimental and may be deprecated in the near future.

Overview

The rnn module includes the recurrent neural network (RNN) cell APIs, a suite of tools for building an RNN's symbolic graph.

.. note:: The `rnn` module offers higher-level interface while `symbol.RNN` is a lower-level interface. The cell APIs in `rnn` module are easier to use in most cases.

The `rnn` module

Cell interfaces

.. autosummary::
    :nosignatures:

    BaseRNNCell.__call__
    BaseRNNCell.unroll
    BaseRNNCell.reset
    BaseRNNCell.begin_state
    BaseRNNCell.unpack_weights
    BaseRNNCell.pack_weights

When working with the cell API, the precise input and output symbols depend on the type of RNN you are using. Take Long Short-Term Memory (LSTM) for example:

import mxnet as mx
# Shape of 'step_data' is (batch_size,).
step_input = mx.symbol.Variable('step_data')

# First we embed our raw input data to be used as LSTM's input.
embedded_step = mx.symbol.Embedding(data=step_input, \
                                    input_dim=input_dim, \
                                    output_dim=embed_dim)

# Then we create an LSTM cell.
lstm_cell = mx.rnn.LSTMCell(num_hidden=50)
# Initialize its hidden and memory states.
# 'begin_state' method takes an initialization function, and uses 'zeros' by default.
begin_state = lstm_cell.begin_state()

The LSTM cell and other non-fused RNN cells are callable. Calling the cell updates it's state once. This transformation depends on both the current input and the previous states. See this blog post for a great introduction to LSTM and other RNN.

# Call the cell to get the output of one time step for a batch.
output, states = lstm_cell(embedded_step, begin_state)

# 'output' is lstm_t0_out_output of shape (batch_size, hidden_dim).

# 'states' has the recurrent states that will be carried over to the next step,
# which includes both the "hidden state" and the "cell state":
# Both 'lstm_t0_out_output' and 'lstm_t0_state_output' have shape (batch_size, hidden_dim).

Most of the time our goal is to process a sequence of many steps. For this, we need to unroll the LSTM according to the sequence length.

# Embed a sequence. 'seq_data' has the shape of (batch_size, sequence_length).
seq_input = mx.symbol.Variable('seq_data')
embedded_seq = mx.symbol.Embedding(data=seq_input, \
                                   input_dim=input_dim, \
                                   output_dim=embed_dim)

.. note:: Remember to reset the cell when unrolling/stepping for a new sequence by calling `lstm_cell.reset()`.

# Note that when unrolling, if 'merge_outputs' is set to True, the 'outputs' is merged into a single symbol
# In the layout, 'N' represents batch size, 'T' represents sequence length, and 'C' represents the
# number of dimensions in hidden states.
outputs, states = lstm_cell.unroll(length=sequence_length, \
                                   inputs=embedded_seq, \
                                   layout='NTC', \
                                   merge_outputs=True)
# 'outputs' is concat0_output of shape (batch_size, sequence_length, hidden_dim).
# The hidden state and cell state from the final time step is returned:
# Both 'lstm_t4_out_output' and 'lstm_t4_state_output' have shape (batch_size, hidden_dim).

# If merge_outputs is set to False, a list of symbols for each of the time steps is returned.
outputs, states = lstm_cell.unroll(length=sequence_length, \
                                   inputs=embedded_seq, \
                                   layout='NTC', \
                                   merge_outputs=False)
# In this case, 'outputs' is a list of symbols. Each symbol is of shape (batch_size, hidden_dim).

.. note:: Loading and saving models that are built with RNN cells API requires using
    `mx.rnn.load_rnn_checkpoint`, `mx.rnn.save_rnn_checkpoint`, and `mx.rnn.do_rnn_checkpoint`.
    The list of all the used cells should be provided as the first argument to those functions.

Basic RNN cells

rnn module supports the following RNN cell types.

.. autosummary::
    :nosignatures:

    LSTMCell
    GRUCell
    RNNCell

Modifier cells

.. autosummary::
    :nosignatures:

    BidirectionalCell
    DropoutCell
    ZoneoutCell
    ResidualCell

A modifier cell takes in one or more cells and transforms the output of those cells. BidirectionalCell is one example. It takes two cells for forward unroll and backward unroll respectively. After unrolling, the outputs of the forward and backward pass are concatenated.

# Bidirectional cell takes two RNN cells, for forward and backward pass respectively.
# Having different types of cells for forward and backward unrolling is allowed.
bi_cell = mx.rnn.BidirectionalCell(
                 mx.rnn.LSTMCell(num_hidden=50),
                 mx.rnn.GRUCell(num_hidden=75))
outputs, states = bi_cell.unroll(length=sequence_length, \
                                 inputs=embedded_seq, \
                                 merge_outputs=True)
# The output feature is the concatenation of the forward and backward pass.
# Thus, the number of output dimensions is the sum of the dimensions of the two cells.
# 'outputs' is the symbol 'bi_out_output' of shape (batch_size, sequence_length, 125L)

# The states of the BidirectionalCell is a list of two lists, corresponding to the
# states of the forward and backward cells respectively.

.. note:: BidirectionalCell cannot be called or stepped, because the backward unroll requires the output of
    future steps, and thus the whole sequence is required.

Dropout and zoneout are popular regularization techniques that can be applied to RNN. rnn module provides DropoutCell and ZoneoutCell for regularization on the output and recurrent states of RNN. ZoneoutCell takes one RNN cell in the constructor, and supports unrolling like other cells.

zoneout_cell = mx.rnn.ZoneoutCell(lstm_cell, zoneout_states=0.5)
outputs, states = zoneout_cell.unroll(length=sequence_length, \
                                      inputs=embedded_seq, \
                                      merge_outputs=True)

DropoutCell performs dropout on the input sequence. It can be used in a stacked multi-layer RNN setting, which we will cover next.

Residual connection is a useful technique for training deep neural models because it helps the propagation of gradients by shortening the paths. ResidualCell provides such functionality for RNN models.

residual_cell = mx.rnn.ResidualCell(lstm_cell)
outputs, states = residual_cell.unroll(length=sequence_length, \
                                       inputs=embedded_seq, \
                                       merge_outputs=True)

The outputs are the element-wise sum of both the input and the output of the LSTM cell.

Multi-layer cells

.. autosummary::
    :nosignatures:

    SequentialRNNCell
    SequentialRNNCell.add

The SequentialRNNCell allows stacking multiple layers of RNN cells to improve the expressiveness and performance of the model. Cells can be added to a SequentialRNNCell in order, from bottom to top. When unrolling, the output of a lower-level cell is automatically passed to the cell above.

stacked_rnn_cells = mx.rnn.SequentialRNNCell()
stacked_rnn_cells.add(mx.rnn.BidirectionalCell(
                          mx.rnn.LSTMCell(num_hidden=50),
                          mx.rnn.LSTMCell(num_hidden=50)))

# Dropout the output of the bottom layer BidirectionalCell with a retention probability of 0.5.
stacked_rnn_cells.add(mx.rnn.DropoutCell(0.5))

stacked_rnn_cells.add(mx.rnn.LSTMCell(num_hidden=50))
outputs, states = stacked_rnn_cells.unroll(length=sequence_length, \
                                           inputs=embedded_seq, \
                                           merge_outputs=True)

# The output of SequentialRNNCell is the same as that of the last layer.
# In this case 'outputs' is the symbol 'concat6_output' of shape (batch_size, sequence_length, hidden_dim)
# The states of the SequentialRNNCell is a list of lists, with each list
# corresponding to the states of each of the added cells respectively.

Fused RNN cell

.. autosummary::
    :nosignatures:

    FusedRNNCell
    FusedRNNCell.unfuse

The computation of an RNN for an input sequence consists of many GEMM and point-wise operations with temporal dependencies dependencies. This could make the computation memory-bound especially on GPU, resulting in longer wall-time. By combining the computation of many small matrices into that of larger ones and streaming the computation whenever possible, the ratio of computation to memory I/O can be increased, which results in better performance on GPU. Such optimization technique is called "fusing". This post talks in greater detail.

The rnn module includes a FusedRNNCell, which provides the optimized fused implementation. The FusedRNNCell supports bidirectional RNNs and dropout.

fused_lstm_cell = mx.rnn.FusedRNNCell(num_hidden=50, \
                                      num_layers=3, \
                                      mode='lstm', \
                                      bidirectional=True, \
                                      dropout=0.5)
outputs, _ = fused_lstm_cell.unroll(length=sequence_length, \
                                    inputs=embedded_seq, \
                                    merge_outputs=True)
# The 'outputs' is the symbol 'lstm_rnn_output' that has the shape
# (batch_size, sequence_length, forward_backward_concat_dim)

.. note:: `FusedRNNCell` supports GPU-only. It cannot be called or stepped.
.. note:: When `dropout` is set to non-zero in `FusedRNNCell`, the dropout is applied to the
    output of all layers except the last layer. If there is only one layer in the `FusedRNNCell`, the
    dropout rate is ignored.
.. note:: Similar to `BidirectionalCell`, when `bidirectional` flag is set to `True`, the output
    of `FusedRNNCell` is twice the size specified by `num_hidden`.

When training a deep, complex model on multiple GPUs it's recommended to stack fused RNN cells (one layer per cell) together instead of one with all layers. The reason is that fused RNN cells don't set gradients to be ready until the computation for the entire layer is completed. Breaking a multi-layer fused RNN cell into several one-layer ones allows gradients to be processed ealier. This reduces communication overhead, especially with multiple GPUs.

The unfuse() method can be used to convert the FusedRNNCell into an equivalent and CPU-compatible SequentialRNNCell that mirrors the settings of the FusedRNNCell.

unfused_lstm_cell = fused_lstm_cell.unfuse()
unfused_outputs, _ = unfused_lstm_cell.unroll(length=sequence_length, \
                                              inputs=embedded_seq, \
                                              merge_outputs=True)
# The 'outputs' is the symbol 'lstm_bi_l2_out_output' that has the shape
# (batch_size, sequence_length, forward_backward_concat_dim)

RNN checkpoint methods and parameters

.. autosummary::
    :nosignatures:

    save_rnn_checkpoint
    load_rnn_checkpoint
    do_rnn_checkpoint

.. autosummary::
    :nosignatures:

    RNNParams
    RNNParams.get

The model parameters from the training with fused cell can be used for inference with unfused cell, and vice versa. As the parameters of fused and unfused cells are organized differently, they need to be converted first. FusedRNNCell's parameters are merged and flattened. In the fused example above, the mode has lstm_parameters of shape (total_num_params,), whereas the equivalent SequentialRNNCell's parameters are separate:

'lstm_l0_i2h_weight': (out_dim, embed_dim)
'lstm_l0_i2h_bias': (out_dim,)
'lstm_l0_h2h_weight': (out_dim, hidden_dim)
'lstm_l0_h2h_bias': (out_dim,)
'lstm_r0_i2h_weight': (out_dim, embed_dim)
...

All cells in the rnn module support the method unpack_weights() for converting FusedRNNCell parameters to the unfused format and pack_weights() for fusing the parameters. The RNN-specific checkpointing methods (load_rnn_checkpoint, save_rnn_checkpoint, do_rnn_checkpoint) handle the conversion transparently based on the provided cells.

I/O utilities

.. autosummary::
    :nosignatures:

    BucketSentenceIter
    encode_sentences

API Reference

.. autoclass:: mxnet.rnn.BaseRNNCell
    :members:

    .. automethod:: __call__
.. autoclass:: mxnet.rnn.LSTMCell
    :members:
.. autoclass:: mxnet.rnn.GRUCell
    :members:
.. autoclass:: mxnet.rnn.RNNCell
    :members:
.. autoclass:: mxnet.rnn.FusedRNNCell
    :members:
.. autoclass:: mxnet.rnn.SequentialRNNCell
    :members:
.. autoclass:: mxnet.rnn.BidirectionalCell
    :members:
.. autoclass:: mxnet.rnn.DropoutCell
    :members:
.. autoclass:: mxnet.rnn.ZoneoutCell
    :members:
.. autoclass:: mxnet.rnn.ResidualCell
    :members:
.. autoclass:: mxnet.rnn.RNNParams
    :members:


.. autoclass:: mxnet.rnn.BucketSentenceIter
    :members:
.. automethod:: mxnet.rnn.encode_sentences

.. automethod:: mxnet.rnn.save_rnn_checkpoint

.. automethod:: mxnet.rnn.load_rnn_checkpoint

.. automethod:: mxnet.rnn.do_rnn_checkpoint

❮Random Distribution Generator Symbol API Sparse Symbol API❯

Can you improve this documentation? These fine people already did:
Sheng Zha & Aaron MarkhamEdit on GitHub

cljdoc builds & hosts documentation for Clojure/Script libraries

Keyboard shortcuts

`Ctrl`+`k`	Jump to recent docs
`←`	Move to previous article
`→`	Move to next article
`Ctrl`+`/`	Jump to the search field

Raise an issue Browse cljdoc source Chat on Slack

× close