npdl.initialization
¶
Functions to create initializers for parameter variables.
Examples¶
>>> from npdl.layers import Dense
>>> from npdl.initialization import GlorotUniform
>>> l1 = Dense(n_out=300, n_in=100, init=GlorotUniform())
Initializers¶
Zero |
Initialize weights with zero value. |
One |
Initialize weights with one value. |
Uniform ([scale]) |
Sample initial weights from the uniform distribution. |
Normal ([std, mean]) |
Sample initial weights from the Gaussian distribution. |
Orthogonal ([gain]) |
Intialize weights as Orthogonal matrix. |
Detailed Description¶
-
class
npdl.initialization.
Initializer
[source]¶ Base class for parameter weight initializers.
The
Initializer
class represents a weight initializer used to initialize weight parameters in a neural network layer. It should be subclassed when implementing new types of weight initializers.
-
class
npdl.initialization.
Normal
(std=0.01, mean=0.0)[source]¶ Sample initial weights from the Gaussian distribution.
Initial weight parameters are sampled from N(mean, std).
Parameters: std : float.
Std of initial parameters.
mean : float.
Mean of initial parameters.
-
class
npdl.initialization.
Uniform
(scale=0.05)[source]¶ Sample initial weights from the uniform distribution.
Parameters are sampled from U(a, b).
Parameters: scale : float or tuple.
When std is None then range determines a, b. If range is a float the weights are sampled from U(-range, range). If range is a tuple the weights are sampled from U(range[0], range[1]).
-
class
npdl.initialization.
Orthogonal
(gain=1.0)[source]¶ Intialize weights as Orthogonal matrix.
Orthogonal matrix initialization [R2]. For n-dimensional shapes where n > 2, the n-1 trailing axes are flattened. For convolutional layers, this corresponds to the fan-in, so this makes the initialization usable for both dense and convolutional layers.
Parameters: gain : float or ‘relu’.
Scaling factor for the weights. Set this to
1.0
for linear and sigmoid units, to ‘relu’ orsqrt(2)
for rectified linear units, and tosqrt(2/(1+alpha**2))
for leaky rectified linear units with leakinessalpha
. Other transfer functions may need different factors.References
[R2] (1, 2) Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.” arXiv preprint arXiv:1312.6120 (2013).
-
class
npdl.initialization.
LecunUniform
[source]¶ LeCun uniform initializer.
It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(3 / fan_in) [R3] where fan_in is the number of input units in the weight matrix.
References
[R3] (1, 2) LeCun 98, Efficient Backprop, http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
-
class
npdl.initialization.
GlorotUniform
[source]¶ Glorot uniform initializer, also called Xavier uniform initializer.
It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) [R4] where fan_in is the number of input units in the weight matrix and fan_out is the number of output units in the weight matrix.
References
[R4] (1, 2) Glorot & Bengio, AISTATS 2010. http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
-
class
npdl.initialization.
GlorotNormal
[source]¶ Glorot normal initializer, also called Xavier normal initializer.
It draws samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 / (fan_in + fan_out)) [R5] where fan_in is the number of input units in the weight matrix and fan_out is the number of output units in the weight matrix.
References
[R5] (1, 2) Glorot & Bengio, AISTATS 2010. http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
-
class
npdl.initialization.
HeNormal
[source]¶ He normal initializer.
It draws samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 / fan_in) [R6] where fan_in is the number of input units in the weight matrix.
References
[R6] (1, 2) He et al., http://arxiv.org/abs/1502.01852
-
class
npdl.initialization.
HeUniform
[source]¶ He uniform variance scaling initializer.
It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / fan_in) [R7] where fan_in is the number of input units in the weight matrix.
References
[R7] (1, 2) He et al., http://arxiv.org/abs/1502.01852
-
class
npdl.initialization.
Orthogonal
(gain=1.0)[source] Intialize weights as Orthogonal matrix.
Orthogonal matrix initialization [R8]. For n-dimensional shapes where n > 2, the n-1 trailing axes are flattened. For convolutional layers, this corresponds to the fan-in, so this makes the initialization usable for both dense and convolutional layers.
Parameters: gain : float or ‘relu’.
Scaling factor for the weights. Set this to
1.0
for linear and sigmoid units, to ‘relu’ orsqrt(2)
for rectified linear units, and tosqrt(2/(1+alpha**2))
for leaky rectified linear units with leakinessalpha
. Other transfer functions may need different factors.References
[R8] (1, 2) Saxe, Andrew M., James L. McClelland, and Surya Ganguli. “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks.” arXiv preprint arXiv:1312.6120 (2013).