Recurrent Layers¶

class npdl.layers.Recurrent(n_out, n_in=None, nb_batch=None, nb_seq=None, init=’glorot_uniform’, inner_init=’orthogonal’, activation=’tanh’, return_sequence=False)[source]¶

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition[R12]_ or speech recognition.[R13]_

Parameters:

n_out : int

hidden number

n_in : int or None

input dimension

nb_batch : int or None

batch size

nb_seq : int or None

sequent length

init : npdl.intializations.Initliazer

init function

inner_init : npdl.intializations.Initliazer

inner init function, between hidden to hidden

activation : npdl.activations.Activation

activation function

return_sequence : bool

return total sequence or not.

References

[R12]

A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009.

[R13]

H. Sak and A. W. Senior and F. Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Proc. Interspeech, pp338-342, Singapore, Sept. 2010

class npdl.layers.SimpleRNN(**kwargs)[source]¶

Fully-connected RNN where the output is to be fed back to input.

\[o_t = tanh(U_t x_t + W_t o_{t-1} + b_t)\]

Parameters:

output_dim: dimension of the internal projections and the final output.

init: weight initialization function.

Can be the name of an existing function (str), or a npdl function.

inner_init: initialization function of the inner cells.

activation: activation function.

Can be the name of an existing function (str), or a npdl function.

return_sequence: if `return_sequences`, 3D `numpy.array` with shape

(batch_size, timesteps, units) will be returned. Else, return 2D numpy.array with shape (batch_size, units).

References

[R14]

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. http://arxiv.org/abs/1512.05287

class npdl.layers.GRU(gate_activation=’sigmoid’, need_grad=True, **kwargs)[source]¶

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014. Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory.[R15]_ They have fewer parameters than LSTM, as they lack an output gate.[R16]_

\[z_t = \sigma(U_z x_t + W_z h_{t-1} + b_z)\]

\[z_t = r_t = \sigma(U_r x_t + W_r h_{t-1} + b_r)\]

\[h_t = tanh(U_h x_t + W_h (s_{t-1} \odot r_t) + b_h)\]

\[s_t = (1- z_t) \odot h_t + z_t \odot s_{t-1}\]

Parameters:

gate_activation : npdl.activations.Activation

Gate activation.

need_grad ： bool

If True, will calculate gradients.

References

[R15]

Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”. arXiv:1412.3555 Freely accessible [cs.NE].

[R16]

“Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML”. Wildml.com. Retrieved May 18, 2016.

class npdl.layers.LSTM(gate_activation=’sigmoid’, need_grad=True, forget_bias_num=1, **kwargs)[source]¶

Long short-term memory (LSTM) is a recurrent neural network (RNN) architecture (an artificial neural network) proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber [R17] and further improved in 2000 by Felix Gers et al.[R18]_ Like most RNNs, a LSTM network is universal in the sense that given enough network units it can compute anything a conventional computer can compute, provided it has the proper weight matrix, which may be viewed as its program.

\[f_t = \sigma(U_f x_t + W_f h_{t-1} + b_f)\]

\[i_t = \sigma(U_i x_t + W_i h_{t-1} + b_f)\]

\[o_t = \sigma(U_o x_t + W_o h_{t-1} + b_h)\]

\[g_t = tanh(U_g x_t + W_g h_{t-1} + b_g)\]

\[c_t = f_t \odot c_{t-1} + i_t \odot g_t\]

\[h_t = o_t \odot tanh(c_t)\]

Parameters:

gate_activation : npdl.activations.Activation

Gate activation.

need_grad ： bool

If True, will calculate gradients.

forget_bias_num : int

integer.

References

[R17]

(1, 2) Sepp Hochreiter; Jürgen Schmidhuber (1997). “Long short-term memory”. Neural Computation. 9 (8): 1735–1780. doi:10.1162/ne co.1997.9.8.1735. PMID 9377276.

[R18]

Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). “Learning to Forget: Continual Prediction with LSTM”. Neural Computation. 12 (10): 2451–2471. doi:10.1162/089976600300015015.

class npdl.layers.BatchLSTM(gate_activation=’sigmoid’, need_grad=True, forget_bias_num=1, **kwargs)[source]¶

Long short-term memory (LSTM) is a special kind of RNN. It is a recurrent neural network (RNN) architecture (an artificial neural network) proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber [R19] and further improved in 2000 by Felix Gers et al.[R20]_ Like most RNNs, a LSTM network is universal in the sense that given enough network units it can compute anything a conventional computer can compute, provided it has the proper weight matrix, which may be viewed as its program.

\[f_t = \sigma(U_f x_t + W_f h_{t-1} + b_f)\]

\[i_t = \sigma(U_i x_t + W_i h_{t-1} + b_f)\]

\[o_t = \sigma(U_o x_t + W_o h_{t-1} + b_h)\]

\[g_t = tanh(U_g x_t + W_g h_{t-1} + b_g)\]

\[c_t = f_t \odot c_{t-1} + i_t \odot g_t\]

\[h_t = o_t \odot tanh(c_t)\]

Parameters:

gate_activation : npdl.activations.Activation

Gate activation.

need_grad ： bool

If True, will calculate gradients.

forget_bias_num : int

integer.

References

[R19]

(1, 2) Sepp Hochreiter; Jürgen Schmidhuber (1997). “Long short-term memory”. Neural Computation. 9 (8): 1735–1780. doi:10.1162/ne co.1997.9.8.1735. PMID 9377276.

[R20]

Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). “Learning to Forget: Continual Prediction with LSTM”. Neural Computation. 12 (10): 2451–2471. doi:10.1162/089976600300015015.

backward(pre_grad, dcn=None, dhn=None)[source]¶

Backward propagation.

Parameters:

pre_grad : numpy.array

Gradients propagated to this layer.

dcn : numpy.array

Gradients of cell state at n time step.

dhn : numpy.array

Gradients of hidden state at n time step.

Returns:

numpy.array

The gradients propagated to previous layer.

connect_to(prev_layer=None)[source]¶

Connection to the previous layer.

Parameters:

prev_layer : npdl.layers.Layer or None

Previous layer.

AllW : numpy.array

type i f o g

bias

x2h

h2h

forward(input, c0=None, h0=None)[source]¶

Forward propagation.

Parameters:

input : numpy.array

input should be of shape (nb_batch,nb_seq,n_in)

c0 : numpy.array or None

init cell state

h0 : numpy.array or None

init hidden state

Returns:

numpy.array

Forward results.