deepmatcher.modules¶
Standard Operations¶
Many components in DeepMatcher, e.g. AttentionWithRNN word aggregator, allow users to customize the behavior of operations performed by the component, such as alignment, vector transformation, pooling, etc., by setting a parameter that specifies the operation. Here we describe operations commonly used across the package, and show how to specify them.
Transform¶
The transform operation takes a single vector and performs transforms it to produce another vector as output. The transformation may be non-linear. A transform operation can be specified by using one of the following:
- A string: One of the styles supported by the
Transform
module. - An instance of
Transform
. - A
callable
: A function that returns a PyTorchModule
. This module must have the same input and output shape signature as theTransform
module.
This operation is implemented by the Transform
module:
-
class
deepmatcher.modules.
Transform
(style, layers=1, bypass_network=None, non_linearity='leaky_relu', hidden_size=None, output_size=None, input_size=None)[source]¶ A multi layered transformation module.
Supports various non-linearities and bypass operations.
Parameters: - style (string) –
A string containing one or more of the following 3 parts, separated by dashes (-):
- ’<N>-layer’: Specifies the number of layers. <N> sets the layers parameter. E.g.: ‘2-layer-highway’.
- ’<nonlinearity>’: Specifies the non-linearity used after each layer. Sets the non_linearity parameter, refer that for details.
- ’<bypass>’ Specifies the Bypass operation to use. Sets the bypass_network parameter. <bypass> is one of:
If any of the 3 parts are missing, the default value for the corresponding parameter is used.
- Examples: Sample styles
- ‘3-layer-relu-highway’, ‘tanh-residual-2-layer’, ‘tanh’, ‘highway’, ‘4-layer’.
- layers (int) – Number of linear transformation layers to use.
- bypass_network (string or
Bypass
or callable) – The bypass network (e.g. residual or highway network) to apply every layer. The input to each linear layer is considered as the raw input to the bypass network and the output of the non-linearity operation is considered as the transformed input. Argument must specify a Bypass operation. If None, does not use a bypass network. - non_linearity (string) –
The non-linearity to use after each linear layer. One of:
- hidden_size (int) – The hidden size of the linear transformation layers. If None, will be set to be equal to input_size.
- output_size (int) – The hidden size of the last linear transformation layer. Will determine the number of features in the output of the module. If None, will be set to be equal to the hidden_size.
- input_size (int) – The number of features in the input to the module. This parameter will be
automatically specified by
LazyModule
.
- Input: An N-d tensor of shape (D1, D2, …, input_size).
- N is 2 or more.
- Output: An N-d tensor of shape (D1, D2, …, output_size).
- output_size need not be the same as input_size, but all other dimensions will remain unchanged.
- style (string) –
Pool¶
The Pool operation takes a sequence of vectors and aggregates this sequence to produce a single vector as output. A Pool operation can be specified using one of the following:
- A string: One of the styles supported by the
Pool
module. - An instance of
Pool
. - A
callable
: A function that returns a PyTorchModule
. This module must have the same input and output shape signature as thePool
module.
This operation is implemented by the Pool
module:
-
class
deepmatcher.modules.
Pool
(style, alpha=0.001)[source]¶ Module that aggregates a given sequence of vectors to produce a single vector.
Parameters: - style (string) –
One of the following strings:
- ’avg’: Take the average of the input vectors. Given a sequence of
vectors \(x_{1:N}\) :\[Pool(x_{1:N}) = \frac{1}{N} \sum_1^N x_i\]
- ’divsqrt’: Take the sum of the input vectors \(x_{1:N}\) and divide
by \(\sqrt{N}\) :\[Pool(x_{1:N}) = \frac{1}{\sqrt{N}} \sum_1^N x_i\]
- ’inv-freq-avg’: Take the smooth inverse frequency weighted sum of the
\(N\) input vectors and divide by \(\sqrt{N}\). This is similar to
the ‘sif’ style but does not perform principal component removal. Given a
sequence of vectors \(x_{1:N}\) corresponding to words \(w_{1:N}\):\[Pool(x_{1:N}) = \frac{1}{\sqrt{N}} \sum_1^N \frac{\alpha}{\alpha + P(w)} x_i\]
where \(P(w)\) is the unigram probability of word \(w\) (computed over all values of this attribute over the entire training dataset) and \(\alpha\) is a scalar (specified by the alpha parameter). \(P(w)\) is computed in
MatchingDataset
, in thecompute_metadata()
method. - ’sif’: Compute the
SIF encoding of the input
vectors. Takes the smooth inverse frequency weighted sum of the \(N\)
input vectors and divides it by \(\sqrt{N}\). Also removes the
projection of the resulting vector along the first principal component of
all word embeddings (corresponding to words in this attribute in the
training set). Given a sequence of vectors \(x_{1:N}\) corresponding to
words \(w_{1:N}\):\[ \begin{align}\begin{aligned}v_x = \frac{1}{\sqrt{N}} \sum_1^N \frac{\alpha}{\alpha + P(w)} x_i\\Pool(x_{1:N}) = v_x - u^T u v_x\end{aligned}\end{align} \]
where \(u\) is the first principal component as described earlier, \(P(w)\) is the unigram probability of word \(w\) (computed over all values of this attribute over the entire training dataset) and \(\alpha\) is a scalar (specified by the alpha parameter). \(u\) and \(P(w)\) are computed in
MatchingDataset
, in thecompute_metadata()
method. - ’max’: Take the max of the input vector sequence along each input feature. If length metadata for each item in the input batch is available, ignores the padding vectors beyond the sequence length of each item when computing the max.
- ’last’: Take the last vector in the input vector sequence. If length metadata for each item in the input batch is available, ignores the padding vectors beyond the sequence length of each item when taking the last vector.
- ’last-simple’: Take the last vector in the input vector sequence. Does NOT take length metadata into account - simply takes the last vector for each input sequence in the batch.
- ’birnn-last’: Treats the input sequence as the output from a bidirectional RNN and takes the last outputs from the forward and backward RNNs. The first half of each vector is assumed to be from the forward RNN and the second half is assumed to be from the bakward RNN. The output thus is the concatenation of first half of the last vector in the input sequence and the last half of the first vector in the sequence. If length metadata for each item in the input batch is available, ignores the padding vectors beyond the sequence length of each item when taking the last vectors for the forward RNN.
- ’birnn-last-simple’: Treats the input sequence as the output from a bidirectional RNN and takes the last outputs from the forward and backward RNNs. Same as the ‘birnn-last’ style but does not consider length metadata even if available.
- ’avg’: Take the average of the input vectors. Given a sequence of
vectors \(x_{1:N}\) :
- alpha (float) – The value used to smooth the inverse word frequencies. Used for ‘inv-freq-avg’ and ‘sif’ styles.
- Input: A 3d tensor of shape (batch, seq_len, input_size).
- The tensor should be wrapped within an
AttrTensor
which contains metadata about the batch. - Output: A 2d tensor of shape (batch, output_size).
- This will be wrapped within an
AttrTensor
(with metadata information unchanged). output_size need not be the same as input_size.
- style (string) –
Merge¶
The Merge operation takes two or more vectors and aggregates the information in them to produce a single vector as output. Unlike the case of Pool operation, the input vectors here are not considered to be sequential in nature. A Merge operation can be specified using one of the following:
- A string: One of the styles supported by the
Merge
module. Note that some styles only support two input vectors to be merged, while others allow multiple inputs. - An instance of
Merge
. - A
callable
: A function that returns a PyTorchModule
. This module must have the same input and output shape signature as theMerge
module.
This operation is implemented by the Merge
module:
-
class
deepmatcher.modules.
Merge
(style)[source]¶ Module that takes two or more vectors and merges them produce a single vector.
Parameters: style (string) – One of the following strings:
- ’concat’: Concatenate all the input vectors along the last dimension (-1).
- ’diff’: Take the difference between two input vectors.
- ’abs-diff’: Take the absolute difference between two input vectors.
- ’concat-diff’: Concatenate the two input vectors, take the difference between the two vectors, and concatenate these two resulting vectors.
- ’concat-abs-diff’: Concatenate the two input vectors, take the absolute difference between the two vectors, and concatenate these two resulting vectors.
- ’mul’: Take the element-wise multiplication between the two input vectors.
- Input: N K-d tensors of shape (D1, D2, …, input_size).
- N and K are both 2 or more.
- Output: One K-d tensor of shape (D1, D2, …, output_size).
- output_size need not be the same as input_size, but all other dimensions will remain unchanged.
Align¶
The Align operation takes two sequences of vectors, aligns the words in them, and returns the corresponding alignment score matrix. For each word in the first sequence, the alignment matrix contains unnormalized scores indicating the degree to which each word in the second sequence aligns with it. For an example of one way to do this, take a look at this paper. An Align operation can be specified using one of the following:
- A string: One of the styles supported by the
AlignmentNetwork
module. - An instance of
AlignmentNetwork
. - A
callable
: A function that returns a PyTorchModule
. This module must have the same input and output shape signature as theAlignmentNetwork
module.
This operation is implemented by the AlignmentNetwork
module:
-
class
deepmatcher.modules.
AlignmentNetwork
(style='decomposable', hidden_size=None, transform_network='2-layer-highway', input_size=None)[source]¶ Neural network to compute alignment between two vector sequences.
Takes two sequences of vectors, aligns the words in them, and returns the corresponding alignment matrix.
Parameters: - style (string) –
One of the following strings:
- ’decomposable’: Use decomposable attention. Alignment score between the
\(i^{th}\) vector in the first sequence \(a_i\) , and the
\(j^{th}\) vector in the second sequence \(b_j\) is computed as
follows:\[score(a_i, b_j) = F(a_i)^T F(b_j)\]
where \(F\) is a Transform operation. Refer the decomposable attention paper for more details.
- ’general’: Use general attention. Alignment score between the
\(i^{th}\) vector in the first sequence \(a_i\) , and the
\(j^{th}\) vector in the second sequence \(b_j\) is computed as
follows:\[score(a_i, b_j) = a_i^T F(b_j)\]
where \(F\) is a Transform operation. Refer the Luong attention paper for more details.
- ’dot’: Use dot product attention. Alignment score between the \(i^{th}\) vector in the first sequence \(a_i\) , and the \(j^{th}\) vector in the second sequence \(b_j\) is computed as follows:
\[score(a_i, b_j) = a_i^T b_j\] - ’decomposable’: Use decomposable attention. Alignment score between the
\(i^{th}\) vector in the first sequence \(a_i\) , and the
\(j^{th}\) vector in the second sequence \(b_j\) is computed as
follows:
- hidden_size (int) – The hidden size to use for the Transform operation, if applicable for the specified style.
- transform_network (string or
Transform
or callable) – The neural network to transform the input vectors, if applicable for the specified style. Argument must specify a Transform operation. - input_size (int) – The number of features in the input to the module. This parameter will be
automatically specified by
LazyModule
.
- Input: Two 3d tensors.
- Two 3d tensors of shape (batch, seq1_len, input_size) and (batch, seq2_len, input_size).
- Output: One 3d tensor of shape (batch, seq1_len, seq2_len).
- The output represents the alignment matrix and contains unnormalized scores. output_size need not be the same as input_size, but all other dimensions will remain unchanged.
- style (string) –
Bypass¶
The Bypass operation takes two tensors, one corresponding to an input tensor and the other corresponding to a transformed version of the first tensor, applies a bypass network and returns one tensor of the same size as the transformed tensor. Examples of bypass networks include residual networks and highway networks.
- A string: One of the styles supported by the
Bypass
module. - An instance of
Bypass
. - A
callable
: A function that returns a PyTorchModule
. This module must have the same input and output shape signature as theBypass
module.
This operation is implemented by the Bypass
module:
-
class
deepmatcher.modules.
Bypass
(style)[source]¶ Module that helps bypass a given transformation of an input.
Supports residual and highway styles of bypass.
Parameters: style (string) – One of the following strings:
- ’residual’: Uses a residual network.
- ’highway’: Uses a highway network.
- Input: Two N-d tensors.
- Two N-d tensors of shape (D1, D2, …, transformed_size) and (D1, D2, …, input_size). The first tensor should corresponds to the transformed version of the second input.
- Output: One N-d tensor of shape (D1, D2, …, transformed_size).
- Note that the shape of the output will match the shape of the first input tensor.
RNN¶
This operation takes a sequence of vectors and produces a context-aware transformation of the input sequence as output. For an intro to RNNs, take a look at this article. An RNN operation can be specified using one of the following:
- A string: One of the unit_types supported by the
RNN
module. - An instance of
RNN
. - A
callable
: A function that returns a PyTorchModule
. This module must have the same input and output shape signature as theRNN
module.
This operation is implemented by the RNN
module:
-
class
deepmatcher.modules.
RNN
(unit_type='gru', hidden_size=None, layers=1, bidirectional=True, dropout=0, input_dropout=0, last_layer_dropout=0, bypass_network=None, connect_num_layers=1, input_size=None, **kwargs)[source]¶ A multi layered RNN that supports dropout and residual / highway connections.
Parameters: - unit_type (string) –
One of the support RNN unit types:
- hidden_size (int) – The hidden size of all RNN layers.
- layers (int) – Number of RNN layers.
- bidirectional (bool) – Whether to use bidirectional RNNs.
- dropout (float) – If non-zero, applies dropout to the outputs of each RNN layer except the last layer. Dropout probability must be between 0 and 1.
- input_dropout (float) – If non-zero, applies dropout to the input to this module. Dropout probability must be between 0 and 1.
- last_layer_dropout (float) – If non-zero, applies dropout to the output of the last RNN layer. Dropout probability must be between 0 and 1.
- bypass_network (string or
Bypass
or callable) – The bypass network (e.g. residual or highway network) to apply every connect_num_layers layers. Argument must specify a Bypass operation. If None, does not use a bypass network. - connect_num_layers (int) – The number of layers between each bypass operation. Note that the layers in which dropout is applied is also controlled by this. If layers is 6 and connect_num_layers is 2, then a bypass network is applied after the 2nd, 4th and 6th layers. Further, if dropout is non-zero, it will only be applied after the 2nd and 4th layers.
- input_size (int) – The number of features in the input to the module. This parameter will be
automatically specified by
LazyModule
. - **kwargs (dict) – Additional keyword arguments are passed to the underlying PyTorch RNN module.
- Input: One 3d tensor of shape (batch, seq_len, input_size).
- The tensor should be wrapped within an
AttrTensor
which contains metadata about the batch. - Output: One 3d tensor of shape (batch, seq_len, output_size).
- This will be wrapped within an
AttrTensor
(with metadata information unchanged). output_size need not be the same as input_size.
- unit_type (string) –
Utility Modules¶
Apart from standard operations, DeepMatcher also contains several utility modules to help glue together various components. These are listed below.
Lambda¶
MultiSequential¶
-
class
deepmatcher.modules.
MultiSequential
(*args)[source]¶ A sequential container that supports multiple module inputs and outputs.
This is an extenstion of PyTorch’s
Sequential
module that allows each module to have multiple inputs and / or outputs.
NoMeta¶
-
class
deepmatcher.modules.
NoMeta
(module)[source]¶ A wrapper module to allow regular modules to take
AttrTensor
s as input.A forward pass through this module, will perform the following:
- If the module input is an
AttrTensor
, gets the data from it, and use as input. - Perform a forward pass through wrapped module with the modified input.
- Using metadata information from the module input (if provided), wrap the result into
an
AttrTensor
and return it.
Parameters: module ( Module
) – The module to wrap.- If the module input is an
ModuleMap¶
-
class
deepmatcher.modules.
ModuleMap
[source]¶ Holds submodules in a map.
Similar to
torch.nn.ModuleList
, but for maps.Example:
import torch.nn as nn import deepmatcher as dm class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() linears = dm.ModuleMap() linears['type'] = nn.Linear(10, 10) linears['color'] = nn.Linear(10, 10) self.linears = linears def forward(self, x1, x2): y1, y2 = self.linears['type'], self.linears['color'] return y1, y2
LazyModule¶
-
class
deepmatcher.modules.
LazyModule
(*args, **kwargs)[source]¶ A lazily initialized module. Base class for most DeepMatcher modules.
This module is an extension of PyTorch
Module
with the following property: constructing an instance this module does not immediately initialize it. This means that if the module has parameters, they will not be instantiated immediately after construction. The module is initialized the first time forward is called. This has the following benefits:- Can be safely deep copied to create structural clones that do not share
parameters. E.g. deep copying a
LazyModule
consisting of a 2 layer Linear NN will produce anotherLazyModule
with 2 layer Linear NN that 1) do not share parameters and 2) have different weight initializations. - Allows automatic input size inference. Refer to description of _init for details.
This module also implements some additional goodies:
- Output shape verification: As part of initialization, this module verifies that
all output tensors have correct output shapes, if the expected output shape is
specified using
expect_signature()
. This verification is done only once during initialization to avoid slowing down training. - NaN checks: All module outputs are cheked for the presence of NaN values that may be difficult to trace down otherwise.
Subclasses of this module are expected to override the following two methods:
- _init(): This is where the constructor of the module should be defined. During the
first forward pass, this method will be called to initialize the module. Whatever
you typically define in the __init__ function of a PyTorch module, you may define
it here. This function may optionally take in an input_size parameter. If it does,
LazyModule
will set it to the size of the last dimension of the input. E.g., if the input is of size 32 * 300, the input_size will be set to 300. Subclasses may choose not to override this method. - _forward(): This is where the computation for the forward pass of the module must be defined. Whatever you typically define in the forward function of a PyTorch module, you may define it here. All subclasses must override this method.
-
__init__
(*args, **kwargs)[source]¶ Construct a
LazyModule
. DO NOT OVERRIDE this method.This does NOT initialize the module - construction simply saves the positional and keyword arguments for future initialization.
Parameters: - *args – Positional arguments to the constructor of the module defined in
_init()
. - **kwargs – Keyword arguments to the constructor of the module defined in
_init()
.
- *args – Positional arguments to the constructor of the module defined in
-
expect_signature
(signature)[source]¶ Set the expected module input / output signature.
Note that this feature is currently not fully functional. More details will be added after implementation.
-
forward
(input, *args, **kwargs)[source]¶ Perform a forward pass through the module. DO NOT OVERRIDE this method.
If the module is not initialized yet, this method also performs initialization. Initialization involves the following:
- Calling the
_init()
method. Tries calling with the input_size keyword parameter set, along with the positional and keyword args specified during construction). If this fails with aTypeError
(i.e., the_init()
method does not have an input_size parameter), then retries initialization without setting input_size. - Verifying the output shape, if
expect_signature()
was called prior to the forward pass. - Setting PyTorch
Module
forward and backward hooks to check for NaNs in module outputs and gradients.
Parameters: - *args – Positional arguments to the forward function of the module defined in
_forward()
. - **kwargs – Keyword arguments to the forward function of the module defined in
_forward()
.
- Calling the
- Can be safely deep copied to create structural clones that do not share
parameters. E.g. deep copying a