deepmatcher.optim¶
SoftNLLLoss¶
-
class
deepmatcher.optim.
SoftNLLLoss
(label_smoothing=0, weight=None, num_classes=2, **kwargs)[source]¶ A soft version of negative log likelihood loss with support for label smoothing.
Effectively equivalent to PyTorch’s
torch.nn.NLLLoss
, if label_smoothing set to zero. While the numerical loss values will be different compared totorch.nn.NLLLoss
, this loss results in the same gradients. This is because the implementation usestorch.nn.KLDivLoss
to support multi-class label smoothing.Parameters: - label_smoothing (float) – The smoothing parameter \(epsilon\) for label smoothing. For details on label smoothing refer this paper.
- weight (
torch.Tensor
) – A 1D tensor of size equal to the number of classes. Specifies the manual weight rescaling applied to each class. Useful in cases when there is severe class imbalance in the training set. - num_classes (int) – The number of classes.
- size_average (bool) – By default, the losses are averaged for each minibatch over observations as
well as over dimensions. However, if
False
the losses are instead summed. This is a keyword only parameter.
Optimizer¶
-
class
deepmatcher.optim.
Optimizer
(method='adam', lr=0.001, max_grad_norm=5, start_decay_at=1, beta1=0.9, beta2=0.999, adagrad_accum=0.0, lr_decay=0.8)[source]¶ Controller class for optimization.
Mostly a thin wrapper for optim, but also useful for implementing learning rate scheduling beyond what is currently available. Also implements necessary methods for training RNNs such as grad manipulations.
Parameters: - method (string) – One of [sgd, adagrad, adadelta, adam].
- lr (float) – Learning rate.
- lr_decay (float) – Learning rate decay multiplier.
- start_decay_at (int) – Epoch to start learning rate decay. If None, starts decay when the validation accuracy stops improving. Defaults to 1.
- beta2 (beta1,) – Hyperarameters for adam.
- adagrad_accum (float, optional) – Initialization hyperparameter for adagrad.
-
set_parameters
(params)[source]¶ Sets the model parameters and initializes the base optimizer.
Parameters: params – Dictionary of named model parameters. Parameters that do not require gradients will be filtered out for optimization.