deepmatcher.modules

Standard Operations

Many components in DeepMatcher, e.g. AttentionWithRNN word aggregator, allow users to customize the behavior of operations performed by the component, such as alignment, vector transformation, pooling, etc., by setting a parameter that specifies the operation. Here we describe operations commonly used across the package, and show how to specify them.

Transform

The transform operation takes a single vector and performs transforms it to produce another vector as output. The transformation may be non-linear. A transform operation can be specified by using one of the following:

  • A string: One of the styles supported by the Transform module.
  • An instance of Transform.
  • A callable: A function that returns a PyTorch Module. This module must have the same input and output shape signature as the Transform module.

This operation is implemented by the Transform module:

class deepmatcher.modules.Transform(style, layers=1, bypass_network=None, non_linearity='leaky_relu', hidden_size=None, output_size=None, input_size=None)[source]

A multi layered transformation module.

Supports various non-linearities and bypass operations.

Parameters:
  • style (string) –

    A string containing one or more of the following 3 parts, separated by dashes (-):

    • <N>-layer’: Specifies the number of layers. <N> sets the layers parameter. E.g.: ‘2-layer-highway’.
    • <nonlinearity>’: Specifies the non-linearity used after each layer. Sets the non_linearity parameter, refer that for details.
    • <bypass>’ Specifies the Bypass operation to use. Sets the bypass_network parameter. <bypass> is one of:
      • ’residual’: Use Bypass with ‘residual’ style.
      • ’highway’: Use Bypass with ‘highway’ style.

    If any of the 3 parts are missing, the default value for the corresponding parameter is used.

    Examples: Sample styles
    ‘3-layer-relu-highway’, ‘tanh-residual-2-layer’, ‘tanh’, ‘highway’, ‘4-layer’.
  • layers (int) – Number of linear transformation layers to use.
  • bypass_network (string or Bypass or callable) – The bypass network (e.g. residual or highway network) to apply every layer. The input to each linear layer is considered as the raw input to the bypass network and the output of the non-linearity operation is considered as the transformed input. Argument must specify a Bypass operation. If None, does not use a bypass network.
  • non_linearity (string) –

    The non-linearity to use after each linear layer. One of:

    • leaky_relu’: Use PyTorch LeakyReLU.
    • relu’: Use PyTorch ReLU.
    • elu’: Use PyTorch ELU.
    • selu’: Use PyTorch SELU.
    • glu’: Use PyTorch glu().
    • tanh’: Use PyTorch Tanh.
    • sigmoid’: Use PyTorch Sigmoid.
  • hidden_size (int) – The hidden size of the linear transformation layers. If None, will be set to be equal to input_size.
  • output_size (int) – The hidden size of the last linear transformation layer. Will determine the number of features in the output of the module. If None, will be set to be equal to the hidden_size.
  • input_size (int) – The number of features in the input to the module. This parameter will be automatically specified by LazyModule.
Input: An N-d tensor of shape (D1, D2, …, input_size).
N is 2 or more.
Output: An N-d tensor of shape (D1, D2, …, output_size).
output_size need not be the same as input_size, but all other dimensions will remain unchanged.

Pool

The Pool operation takes a sequence of vectors and aggregates this sequence to produce a single vector as output. A Pool operation can be specified using one of the following:

  • A string: One of the styles supported by the Pool module.
  • An instance of Pool.
  • A callable: A function that returns a PyTorch Module. This module must have the same input and output shape signature as the Pool module.

This operation is implemented by the Pool module:

class deepmatcher.modules.Pool(style, alpha=0.001)[source]

Module that aggregates a given sequence of vectors to produce a single vector.

Parameters:
  • style (string) –

    One of the following strings:

    • avg’: Take the average of the input vectors. Given a sequence of vectors \(x_{1:N}\) :
      \[Pool(x_{1:N}) = \frac{1}{N} \sum_1^N x_i\]
    • divsqrt’: Take the sum of the input vectors \(x_{1:N}\) and divide by \(\sqrt{N}\) :
      \[Pool(x_{1:N}) = \frac{1}{\sqrt{N}} \sum_1^N x_i\]
    • inv-freq-avg’: Take the smooth inverse frequency weighted sum of the \(N\) input vectors and divide by \(\sqrt{N}\). This is similar to the ‘sif’ style but does not perform principal component removal. Given a sequence of vectors \(x_{1:N}\) corresponding to words \(w_{1:N}\):
      \[Pool(x_{1:N}) = \frac{1}{\sqrt{N}} \sum_1^N \frac{\alpha}{\alpha + P(w)} x_i\]

      where \(P(w)\) is the unigram probability of word \(w\) (computed over all values of this attribute over the entire training dataset) and \(\alpha\) is a scalar (specified by the alpha parameter). \(P(w)\) is computed in MatchingDataset, in the compute_metadata() method.

    • sif’: Compute the SIF encoding of the input vectors. Takes the smooth inverse frequency weighted sum of the \(N\) input vectors and divides it by \(\sqrt{N}\). Also removes the projection of the resulting vector along the first principal component of all word embeddings (corresponding to words in this attribute in the training set). Given a sequence of vectors \(x_{1:N}\) corresponding to words \(w_{1:N}\):
      \[ \begin{align}\begin{aligned}v_x = \frac{1}{\sqrt{N}} \sum_1^N \frac{\alpha}{\alpha + P(w)} x_i\\Pool(x_{1:N}) = v_x - u^T u v_x\end{aligned}\end{align} \]

      where \(u\) is the first principal component as described earlier, \(P(w)\) is the unigram probability of word \(w\) (computed over all values of this attribute over the entire training dataset) and \(\alpha\) is a scalar (specified by the alpha parameter). \(u\) and \(P(w)\) are computed in MatchingDataset, in the compute_metadata() method.

    • max’: Take the max of the input vector sequence along each input feature. If length metadata for each item in the input batch is available, ignores the padding vectors beyond the sequence length of each item when computing the max.
    • last’: Take the last vector in the input vector sequence. If length metadata for each item in the input batch is available, ignores the padding vectors beyond the sequence length of each item when taking the last vector.
    • last-simple’: Take the last vector in the input vector sequence. Does NOT take length metadata into account - simply takes the last vector for each input sequence in the batch.
    • birnn-last’: Treats the input sequence as the output from a bidirectional RNN and takes the last outputs from the forward and backward RNNs. The first half of each vector is assumed to be from the forward RNN and the second half is assumed to be from the bakward RNN. The output thus is the concatenation of first half of the last vector in the input sequence and the last half of the first vector in the sequence. If length metadata for each item in the input batch is available, ignores the padding vectors beyond the sequence length of each item when taking the last vectors for the forward RNN.
    • birnn-last-simple’: Treats the input sequence as the output from a bidirectional RNN and takes the last outputs from the forward and backward RNNs. Same as the ‘birnn-last’ style but does not consider length metadata even if available.
  • alpha (float) – The value used to smooth the inverse word frequencies. Used for ‘inv-freq-avg’ and ‘sif’ styles.
Input: A 3d tensor of shape (batch, seq_len, input_size).
The tensor should be wrapped within an AttrTensor which contains metadata about the batch.
Output: A 2d tensor of shape (batch, output_size).
This will be wrapped within an AttrTensor (with metadata information unchanged). output_size need not be the same as input_size.

Merge

The Merge operation takes two or more vectors and aggregates the information in them to produce a single vector as output. Unlike the case of Pool operation, the input vectors here are not considered to be sequential in nature. A Merge operation can be specified using one of the following:

  • A string: One of the styles supported by the Merge module. Note that some styles only support two input vectors to be merged, while others allow multiple inputs.
  • An instance of Merge.
  • A callable: A function that returns a PyTorch Module. This module must have the same input and output shape signature as the Merge module.

This operation is implemented by the Merge module:

class deepmatcher.modules.Merge(style)[source]

Module that takes two or more vectors and merges them produce a single vector.

Parameters:style (string) –

One of the following strings:

  • concat’: Concatenate all the input vectors along the last dimension (-1).
  • diff’: Take the difference between two input vectors.
  • abs-diff’: Take the absolute difference between two input vectors.
  • concat-diff’: Concatenate the two input vectors, take the difference between the two vectors, and concatenate these two resulting vectors.
  • concat-abs-diff’: Concatenate the two input vectors, take the absolute difference between the two vectors, and concatenate these two resulting vectors.
  • mul’: Take the element-wise multiplication between the two input vectors.
Input: N K-d tensors of shape (D1, D2, …, input_size).
N and K are both 2 or more.
Output: One K-d tensor of shape (D1, D2, …, output_size).
output_size need not be the same as input_size, but all other dimensions will remain unchanged.

Align

The Align operation takes two sequences of vectors, aligns the words in them, and returns the corresponding alignment score matrix. For each word in the first sequence, the alignment matrix contains unnormalized scores indicating the degree to which each word in the second sequence aligns with it. For an example of one way to do this, take a look at this paper. An Align operation can be specified using one of the following:

This operation is implemented by the AlignmentNetwork module:

class deepmatcher.modules.AlignmentNetwork(style='decomposable', hidden_size=None, transform_network='2-layer-highway', input_size=None)[source]

Neural network to compute alignment between two vector sequences.

Takes two sequences of vectors, aligns the words in them, and returns the corresponding alignment matrix.

Parameters:
  • style (string) –

    One of the following strings:

    • decomposable’: Use decomposable attention. Alignment score between the \(i^{th}\) vector in the first sequence \(a_i\) , and the \(j^{th}\) vector in the second sequence \(b_j\) is computed as follows:
      \[score(a_i, b_j) = F(a_i)^T F(b_j)\]

      where \(F\) is a Transform operation. Refer the decomposable attention paper for more details.

    • general’: Use general attention. Alignment score between the \(i^{th}\) vector in the first sequence \(a_i\) , and the \(j^{th}\) vector in the second sequence \(b_j\) is computed as follows:
      \[score(a_i, b_j) = a_i^T F(b_j)\]

      where \(F\) is a Transform operation. Refer the Luong attention paper for more details.

    • dot’: Use dot product attention. Alignment score between the \(i^{th}\) vector in the first sequence \(a_i\) , and the \(j^{th}\) vector in the second sequence \(b_j\) is computed as follows:
    \[score(a_i, b_j) = a_i^T b_j\]
  • hidden_size (int) – The hidden size to use for the Transform operation, if applicable for the specified style.
  • transform_network (string or Transform or callable) – The neural network to transform the input vectors, if applicable for the specified style. Argument must specify a Transform operation.
  • input_size (int) – The number of features in the input to the module. This parameter will be automatically specified by LazyModule.
Input: Two 3d tensors.
Two 3d tensors of shape (batch, seq1_len, input_size) and (batch, seq2_len, input_size).
Output: One 3d tensor of shape (batch, seq1_len, seq2_len).
The output represents the alignment matrix and contains unnormalized scores. output_size need not be the same as input_size, but all other dimensions will remain unchanged.

Bypass

The Bypass operation takes two tensors, one corresponding to an input tensor and the other corresponding to a transformed version of the first tensor, applies a bypass network and returns one tensor of the same size as the transformed tensor. Examples of bypass networks include residual networks and highway networks.

  • A string: One of the styles supported by the Bypass module.
  • An instance of Bypass.
  • A callable: A function that returns a PyTorch Module. This module must have the same input and output shape signature as the Bypass module.

This operation is implemented by the Bypass module:

class deepmatcher.modules.Bypass(style)[source]

Module that helps bypass a given transformation of an input.

Supports residual and highway styles of bypass.

Parameters:style (string) –

One of the following strings:

Input: Two N-d tensors.
Two N-d tensors of shape (D1, D2, …, transformed_size) and (D1, D2, …, input_size). The first tensor should corresponds to the transformed version of the second input.
Output: One N-d tensor of shape (D1, D2, …, transformed_size).
Note that the shape of the output will match the shape of the first input tensor.

RNN

This operation takes a sequence of vectors and produces a context-aware transformation of the input sequence as output. For an intro to RNNs, take a look at this article. An RNN operation can be specified using one of the following:

  • A string: One of the unit_types supported by the RNN module.
  • An instance of RNN.
  • A callable: A function that returns a PyTorch Module. This module must have the same input and output shape signature as the RNN module.

This operation is implemented by the RNN module:

class deepmatcher.modules.RNN(unit_type='gru', hidden_size=None, layers=1, bidirectional=True, dropout=0, input_dropout=0, last_layer_dropout=0, bypass_network=None, connect_num_layers=1, input_size=None, **kwargs)[source]

A multi layered RNN that supports dropout and residual / highway connections.

Parameters:
  • unit_type (string) –

    One of the support RNN unit types:

    • gru’: Apply a gated recurrent unit (GRU) RNN. Uses PyTorch GRU under the hood.
    • lstm’: Apply a long short-term memory unit (LSTM) RNN. Uses PyTorch LSTM under the hood.
    • rnn’: Apply an Elman RNN. Uses PyTorch RNN under the hood.
  • hidden_size (int) – The hidden size of all RNN layers.
  • layers (int) – Number of RNN layers.
  • bidirectional (bool) – Whether to use bidirectional RNNs.
  • dropout (float) – If non-zero, applies dropout to the outputs of each RNN layer except the last layer. Dropout probability must be between 0 and 1.
  • input_dropout (float) – If non-zero, applies dropout to the input to this module. Dropout probability must be between 0 and 1.
  • last_layer_dropout (float) – If non-zero, applies dropout to the output of the last RNN layer. Dropout probability must be between 0 and 1.
  • bypass_network (string or Bypass or callable) – The bypass network (e.g. residual or highway network) to apply every connect_num_layers layers. Argument must specify a Bypass operation. If None, does not use a bypass network.
  • connect_num_layers (int) – The number of layers between each bypass operation. Note that the layers in which dropout is applied is also controlled by this. If layers is 6 and connect_num_layers is 2, then a bypass network is applied after the 2nd, 4th and 6th layers. Further, if dropout is non-zero, it will only be applied after the 2nd and 4th layers.
  • input_size (int) – The number of features in the input to the module. This parameter will be automatically specified by LazyModule.
  • **kwargs (dict) – Additional keyword arguments are passed to the underlying PyTorch RNN module.
Input: One 3d tensor of shape (batch, seq_len, input_size).
The tensor should be wrapped within an AttrTensor which contains metadata about the batch.
Output: One 3d tensor of shape (batch, seq_len, output_size).
This will be wrapped within an AttrTensor (with metadata information unchanged). output_size need not be the same as input_size.

Utility Modules

Apart from standard operations, DeepMatcher also contains several utility modules to help glue together various components. These are listed below.

Lambda

class deepmatcher.modules.Lambda(lambd)[source]

Wrapper to convert a function to a module.

Parameters:lambd (callable) – The function to convert into a module. It must take in one or more Pytorch Tensor s and return one or more Tensor s.

MultiSequential

class deepmatcher.modules.MultiSequential(*args)[source]

A sequential container that supports multiple module inputs and outputs.

This is an extenstion of PyTorch’s Sequential module that allows each module to have multiple inputs and / or outputs.

NoMeta

class deepmatcher.modules.NoMeta(module)[source]

A wrapper module to allow regular modules to take AttrTensor s as input.

A forward pass through this module, will perform the following:

  • If the module input is an AttrTensor, gets the data from it, and use as input.
  • Perform a forward pass through wrapped module with the modified input.
  • Using metadata information from the module input (if provided), wrap the result into an AttrTensor and return it.
Parameters:module (Module) – The module to wrap.

ModuleMap

class deepmatcher.modules.ModuleMap[source]

Holds submodules in a map.

Similar to torch.nn.ModuleList, but for maps.

Example:

import torch.nn as nn
import deepmatcher as dm

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        linears = dm.ModuleMap()
        linears['type'] = nn.Linear(10, 10)
        linears['color'] = nn.Linear(10, 10)
        self.linears = linears

    def forward(self, x1, x2):
        y1, y2 = self.linears['type'], self.linears['color']
        return y1, y2

LazyModule

class deepmatcher.modules.LazyModule(*args, **kwargs)[source]

A lazily initialized module. Base class for most DeepMatcher modules.

This module is an extension of PyTorch Module with the following property: constructing an instance this module does not immediately initialize it. This means that if the module has parameters, they will not be instantiated immediately after construction. The module is initialized the first time forward is called. This has the following benefits:

  • Can be safely deep copied to create structural clones that do not share parameters. E.g. deep copying a LazyModule consisting of a 2 layer Linear NN will produce another LazyModule with 2 layer Linear NN that 1) do not share parameters and 2) have different weight initializations.
  • Allows automatic input size inference. Refer to description of _init for details.

This module also implements some additional goodies:

  • Output shape verification: As part of initialization, this module verifies that all output tensors have correct output shapes, if the expected output shape is specified using expect_signature(). This verification is done only once during initialization to avoid slowing down training.
  • NaN checks: All module outputs are cheked for the presence of NaN values that may be difficult to trace down otherwise.

Subclasses of this module are expected to override the following two methods:

  • _init(): This is where the constructor of the module should be defined. During the first forward pass, this method will be called to initialize the module. Whatever you typically define in the __init__ function of a PyTorch module, you may define it here. This function may optionally take in an input_size parameter. If it does, LazyModule will set it to the size of the last dimension of the input. E.g., if the input is of size 32 * 300, the input_size will be set to 300. Subclasses may choose not to override this method.
  • _forward(): This is where the computation for the forward pass of the module must be defined. Whatever you typically define in the forward function of a PyTorch module, you may define it here. All subclasses must override this method.
__init__(*args, **kwargs)[source]

Construct a LazyModule. DO NOT OVERRIDE this method.

This does NOT initialize the module - construction simply saves the positional and keyword arguments for future initialization.

Parameters:
  • *args – Positional arguments to the constructor of the module defined in _init().
  • **kwargs – Keyword arguments to the constructor of the module defined in _init().
expect_signature(signature)[source]

Set the expected module input / output signature.

Note that this feature is currently not fully functional. More details will be added after implementation.

forward(input, *args, **kwargs)[source]

Perform a forward pass through the module. DO NOT OVERRIDE this method.

If the module is not initialized yet, this method also performs initialization. Initialization involves the following:

  1. Calling the _init() method. Tries calling with the input_size keyword parameter set, along with the positional and keyword args specified during construction). If this fails with a TypeError (i.e., the _init() method does not have an input_size parameter), then retries initialization without setting input_size.
  2. Verifying the output shape, if expect_signature() was called prior to the forward pass.
  3. Setting PyTorch Module forward and backward hooks to check for NaNs in module outputs and gradients.
Parameters:
  • *args – Positional arguments to the forward function of the module defined in _forward().
  • **kwargs – Keyword arguments to the forward function of the module defined in _forward().

LazyModuleFn

class deepmatcher.modules.LazyModuleFn(*args, **kwargs)[source]

A Lazy Module which simply wraps the Module returned by a specified function.

This provides a way to convert a PyTorch Module into a LazyModule.

Parameters:
  • fn (callable) – Function that returns a Module.
  • *args – Positional arguments to the function fn.
  • *kwargs – Keyword arguments to the function fn.