deepmatcher.word_comparators

class deepmatcher.word_comparators.Attention(heads=1, hidden_size=None, raw_alignment=False, input_dropout=0, alignment_network='decomposable', scale=False, score_dropout=0, value_transform_network=None, input_transform_network=None, value_merge='concat', transform_dropout=0, comparison_merge='concat', comparison_network='2-layer-highway', input_size=None)[source]

Attention based Word Comparator with multi-head support. This module does the following:

  1. Computes an alignment matrix between the primary input sequence and the context input sequence.
  2. For each vector in the primary input sequence, takes a weighted average over all vectors in the context input sequence, where weights are given by the alignment matrix. Intuitively, for each word / phrase vector in the primary input sequence, this represents the aligning word / phrase vector in the context input sequence.
  3. Compares the vectors in the primary input sequence with its aligning vectors.
Parameters:
  • heads (int) – Number of attention heads to use. Defaults to 1.
  • hidden_size (int) – The default hidden size of the alignment_network, transform networks (if applicable), and comparison network.
  • raw_alignment (bool) – If True, uses the contextualized version (transformed by the Word Contextualizer module) of the input and context sequences for computing alignment in Step 1 described above. If False, uses the raw (non-contextualized) word embedding sequences for computing alignment in Step 1 described above. For step 2, the Word Contextualizer’s version of the context sequence is used for computing the weighted averages in both cases. Raw alignment has been shown to perform better and speed up convergence, especially in cases of limited training data.
  • input_dropout (float) – If non-zero, applies dropout to the input to this module. Dropout probability must be between 0 and 1.
  • alignment_network (string or deepmatcher.modules.AlignmentNetwork or callable) – The neural network that takes the primary input sequence, aligns the word / phrase vectors in this sequence with word / phrase vector in the context sequence, and returns the corresponding alignment score matrix. Argument must specify a Align operation.
  • scale (bool) – Whether to scale the alignment scores by the square root of the hidden_size parameter. Based on scaled dot-product attention
  • score_dropout (float) – If non-zero, applies dropout to the alignment score matrix. Dropout probability must be between 0 and 1.
  • value_transform_network (string or Transform or callable) – The neural network to transform the context input sequence before taking the weighted averages in Step 2 described above. Argument must be None or specify a Transform operation. If the argument is a string, the hidden size of the transform operation is computed as hidden_size // heads. If argument is None, and heads is 1, then the values are not transformed. If argument is None and heads is > 1, then a 1 layer highway network without any non-linearity is used. The hidden size for this is computed as mentioned above.
  • input_transform_network (string or Transform or callable) – The neural network to transform the primary input sequence before it is compared with the aligning vectors in the context sequence in Step 3 described above. Argument must be None or specify a Transform operation. If None, uses the same neural network as the value transform network (sharing not just the structure but also weight parameters).
  • value_merge (string or Merge or callable) – Specifies how to merge the outputs of all attention heads for each vector in the primary input sequence. Concatenates the outputs of all heads by default. Argument must specify a Merge operation.
  • transform_dropout (float) – If non-zero, applies dropout to the outputs of the value_transform_network and input_transform_network, if applicable. Dropout probability must be between 0 and 1.
  • comparison_merge (string or Merge or callable) – For each vector in the primary input sequence, specifies how to merge it with its aligning vector in the context input sequence, to obtain a single vector. The resulting sequence vectors forms the input to the comparison_network. Concatenates each primary input vector with its aligning vector by default. Argument must specify a Merge operation.
  • comparison_network (string or Transform or callable) – The neural network to compare the vectors in the primary input sequence and their aligning vectors in the context input. Input to this module is produced by the comparison_merge operation. Argument must specify a Transform operation.
  • input_size (int) – The number of features in the input to the module. This parameter will be automatically specified by LazyModule.