deepmatcher.optim¶

SoftNLLLoss¶

class deepmatcher.optim.SoftNLLLoss(label_smoothing=0, weight=None, num_classes=2, **kwargs)[source]¶

A soft version of negative log likelihood loss with support for label smoothing.

Effectively equivalent to PyTorch’s torch.nn.NLLLoss, if label_smoothing set to zero. While the numerical loss values will be different compared to torch.nn.NLLLoss, this loss results in the same gradients. This is because the implementation uses torch.nn.KLDivLoss to support multi-class label smoothing.

Parameters:

label_smoothing (float) – The smoothing parameter \(epsilon\) for label smoothing. For details on label smoothing refer this paper.
weight (torch.Tensor) – A 1D tensor of size equal to the number of classes. Specifies the manual weight rescaling applied to each class. Useful in cases when there is severe class imbalance in the training set.
num_classes (int) – The number of classes.
size_average (bool) – By default, the losses are averaged for each minibatch over observations as well as over dimensions. However, if False the losses are instead summed. This is a keyword only parameter.

Optimizer¶

class deepmatcher.optim.Optimizer(method='adam', lr=0.001, max_grad_norm=5, start_decay_at=1, beta1=0.9, beta2=0.999, adagrad_accum=0.0, lr_decay=0.8)[source]¶

Controller class for optimization.

Mostly a thin wrapper for optim, but also useful for implementing learning rate scheduling beyond what is currently available. Also implements necessary methods for training RNNs such as grad manipulations.

Parameters:

method (string) – One of [sgd, adagrad, adadelta, adam].
lr (float) – Learning rate.
lr_decay (float) – Learning rate decay multiplier.
start_decay_at (int) – Epoch to start learning rate decay. If None, starts decay when the validation accuracy stops improving. Defaults to 1.
beta2 (beta1,) – Hyperarameters for adam.
adagrad_accum (float, optional) – Initialization hyperparameter for adagrad.

set_parameters(params)[source]¶

Sets the model parameters and initializes the base optimizer.

Parameters:	params – Dictionary of named model parameters. Parameters that do not require gradients will be filtered out for optimization.

step()[source]¶

Update the model parameters based on current gradients.

Optionally, will employ gradient clipping.

update_learning_rate(acc, epoch)[source]¶

Decay learning rate.

Decays lerning rate if val perf does not improve or we hit the start_decay_at limit.

Parameters:	acc – The accuracy score on the validation set. epoch – The current epoch number.