deepmatcher.batch

AttrTensor

class deepmatcher.batch.AttrTensor[source]

A wrapper around the batch tensor for a specific attribute.

The purpose of having a wrapper around the tensor is to include attribute specific metadata along with it. Metadata include the following:

  • lengths: Lengths of each sequence (attribute value) in the batch.
  • word_probs: For each sequence in the batch, a list of word probabilities corresponding to words in the sequence.
  • pc: The first principal component of the sequence embeddings for all values of this attribute. For details on how this is computed refer documentation for compute_metadata(). This is used for implementing the SIF model proposed in this paper.

This class is essentially a namedtuple. The tensor containing the data and the associated metadata described above can be accessed as follows:

name_attr = AttrTensor(data, lengths, word_probs, pc)
assert(name_attr.data == data)
assert(name_attr.lengths == lengths)
assert(name_attr.word_probs == word_probs)
assert(name_attr.pc == pc)
static from_old_metadata(data, old_attrtensor)[source]

Wrap a PyTorch torch.Tensor into an AttrTensor.

The metadata information is (shallow) copied from a pre-existing AttrTensor. This is useful when the data for an attribute is transformed by a neural network and we wish the wrap the result into an AttrTensor for further processing by another module that requires access to metadata.

Parameters:old_attrtensor (AttrTensor) – The pre-existing AttrTensor to copy metadata from.

MatchingBatch

class deepmatcher.batch.MatchingBatch(input, train_info)[source]

A batch of data and associated metadata for a text matching task.

Consists of one AttrTensor (containing the data and metadata) for each attribute. For example, the AttrTensor s of a MatchingBatch object mbatch for a matching task with two attribtues name and category, can be accessed as follows:

name_attr = mbatch.name
category_attr = mbatch.category