datasets.processors¶

The processors exist in MMF to make data processing pipelines in various datasets as similar as possible while allowing code reuse.

The processors also help maintain proper abstractions to keep only what matters inside the dataset’s code. This allows us to keep the dataset __getitem__ logic really clean and no need about maintaining opinions about data type. Processors can work on both images and text due to their generic structure.

To create a new processor, follow these steps:

Inherit the BaseProcessor class.
Implement _call function which takes in a dict and returns a dict with same keys preprocessed as well as any extra keys that need to be returned.
Register the processor using @registry.register_processor('name') to registry where ‘name’ will be used to refer to your processor later.

In processor’s config you can specify preprocessor option to specify different kind of preprocessors you want in your dataset.

Let’s break down processor’s config inside a dataset (VQA2.0) a bit to understand different moving parts.

Config:

dataset_config:
  vqa2:
    data_dir: ${env.data_dir}
    processors:
      text_processor:
        type: vocab
        params:
          max_length: 14
          vocab:
            type: intersected
            embedding_name: glove.6B.300d
            vocab_file: vqa2/defaults/extras/vocabs/vocabulary_100k.txt
          preprocessor:
            type: simple_sentence
            params: {}

BaseDataset will init the processors and they will available inside your dataset with same attribute name as the key name, for e.g. text_processor will be available as self.text_processor inside your dataset. As is with every module in MMF, processor also accept a DictConfig with a type and params attributes. params defined the custom parameters for each of the processors. By default, processor initialization process will also init preprocessor attribute which can be a processor config in itself. preprocessor can be then be accessed inside the processor’s functions.

Example:

from mmf.common.registry import registry
from mmf.datasets.processors import BaseProcessor

@registry.register_processor('my_processor')
class MyProcessor(BaseProcessor):
    def __init__(self, config, *args, **kwargs):
        return

    def __call__(self, item, *args, **kwargs):
        text = item['text']
        text = [t.strip() for t in text.split(" ")]
        return {"text": text}

class mmf.datasets.processors.processors.BBoxProcessor(config, *args, **kwargs)[source]¶

Generates bboxes in proper format. Takes in a dict which contains “info” key which is a list of dicts containing following for each of the the bounding box

Example bbox input:

{
    "info": [
        {
            "bounding_box": {
                "top_left_x": 100,
                "top_left_y": 100,
                "width": 200,
                "height": 300
            }
        },
        ...
    ]
}

This will further return a Sample in a dict with key “bbox” with last dimension of 4 corresponding to “xyxy”. So sample will look like following:

Example Sample:

Sample({
    "coordinates": torch.Size(n, 4),
    "width": List[number], # size n
    "height": List[number], # size n
    "bbox_types": List[str] # size n, either xyxy or xywh.
    # currently only supports xyxy.
})

class mmf.datasets.processors.processors.BaseProcessor(*args, config: Optional[omegaconf.dictconfig.DictConfig] = None, **kwargs)[source]¶

Every processor in MMF needs to inherit this class for compatibility with MMF. End user mainly needs to implement __call__ function.

Parameters: config (DictConfig) – Config for this processor, containing type and params attributes if available.

class mmf.datasets.processors.processors.BatchProcessor(config: mmf.datasets.processors.processors.BatchProcessorConfigType, *args, **kwargs)[source]¶

BatchProcessor is an extension of normal processor which usually are used in cases where dataset works on full batch instead of samples. Such cases can be observed in the case of the iterable datasets. BatchProcessor if provided with processors key in the config, will initialize a member variable processors_dict for you which will contain initialization of all of the processors you specified and will need to process your complete batch.

Rest it behaves in same way, expects an item and returns an item which can be of any type.

class mmf.datasets.processors.processors.BatchProcessorConfigType(processors: mmf.common.typings.ProcessorConfigType)[source]¶

class mmf.datasets.processors.processors.CaptionProcessor(config, *args, **kwargs)[source]¶

Processes a caption with start, end and pad tokens and returns raw string.

Parameters: config (DictConfig) – Configuration for caption processor.

class mmf.datasets.processors.processors.CopyProcessor(config, *args, **kwargs)[source]¶: Copy boxes from numpy array

class mmf.datasets.processors.processors.DETRImageAndTargetProcessor(config, *args, **kwargs)[source]¶: Process a detection image and target in consistent with DETR. At training time, random crop is done. At test time, an image is deterministically resized with short side equal to image_size (while ensuring its long side no larger than max_size)

class mmf.datasets.processors.processors.EvalAIAnswerProcessor(*args, **kwargs)[source]¶: Processes an answer similar to Eval AI

class mmf.datasets.processors.processors.FastTextProcessor(config, *args, **kwargs)[source]¶

FastText processor, similar to GloVe processor but returns FastText vectors.

Parameters: config (DictConfig) – Configuration values for the processor.

class mmf.datasets.processors.processors.GloVeProcessor(config, *args, **kwargs)[source]¶

Inherits VocabProcessor, and returns GloVe vectors for each of the words. Maps them to index using vocab processor, and then gets GloVe vectors corresponding to those indices.

Parameters: config (DictConfig) – Configuration parameters for GloVe same as VocabProcessor().

class mmf.datasets.processors.processors.GraphVQAAnswerProcessor(config, *args, **kwargs)[source]¶

Processor for generating answer scores for answers passed using VQA accuracy formula. Using VocabDict class to represent answer vocabulary, so parameters must specify “vocab_file”. “num_answers” in parameter config specify the max number of answers possible. Takes in dict containing “answers” or “answers_tokens”. “answers” are preprocessed to generate “answers_tokens” if passed.

This version also takes a graph vocab and predicts a main and graph stream simultanously

Parameters: config (DictConfig) – Configuration for the processor

answer_vocab¶

Class representing answer vocabulary

Type: VocabDict

compute_answers_scores(answers_indices)[source]¶

Generate VQA based answer scores for answers_indices.

Parameters: answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns: tensor containing scores.
Return type: torch.FloatTensor

get_true_vocab_size()[source]¶

True vocab size can be different from normal vocab size in some cases such as soft copy where dynamic answer space is added.

Returns: True vocab size.
Return type: int

get_vocab_size()[source]¶

Get vocab size of the answer vocabulary. Can also include soft copy dynamic answer space size.

Returns: size of the answer vocabulary
Return type: int

idx2word(idx)[source]¶

Index to word according to the vocabulary.

Parameters: idx (int) – Index to be converted to the word.
Returns: Word corresponding to the index.
Return type: str

word2idx(word)[source]¶

Convert a word to its index according to vocabulary

Parameters: word (str) – Word to be converted to index.
Returns: Index of the word.
Return type: int

class mmf.datasets.processors.processors.M4CAnswerProcessor(config, *args, **kwargs)[source]¶

Process a TextVQA answer for iterative decoding in M4C

match_answer_to_vocab_ocr_seq(answer, vocab2idx_dict, ocr2inds_dict, max_match_num=20)[source]¶: Match an answer to a list of sequences of indices each index corresponds to either a fixed vocabulary or an OCR token (in the index address space, the OCR tokens are after the fixed vocab)

class mmf.datasets.processors.processors.M4CCaptionProcessor(config, *args, **kwargs)[source]¶

class mmf.datasets.processors.processors.MaskedRegionProcessor(config, *args, **kwargs)[source]¶: Masks a region with probability mask_probability

class mmf.datasets.processors.processors.MultiClassFromFile(config: mmf.datasets.processors.processors.MultiClassFromFileConfig, *args, **kwargs)[source]¶: Label processor for multi class cases where the labels are saved in a file.

class mmf.datasets.processors.processors.MultiClassFromFileConfig(vocab_file: str)[source]¶

class mmf.datasets.processors.processors.MultiHotAnswerFromVocabProcessor(config, *args, **kwargs)[source]¶

compute_answers_scores(answers_indices)[source]¶

Generate VQA based answer scores for answers_indices.

Parameters: answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns: tensor containing scores.
Return type: torch.FloatTensor

class mmf.datasets.processors.processors.PhocProcessor(config, *args, **kwargs)[source]¶: Compute PHOC features from text tokens

class mmf.datasets.processors.processors.Processor(config: mmf.common.typings.ProcessorConfigType, *args, **kwargs)[source]¶

Wrapper class used by MMF to initialized processor based on their type as passed in configuration. It retrieves the processor class registered in registry corresponding to the type key and initializes with params passed in configuration. All functions and attributes of the processor initialized are directly available via this class.

Parameters: config (DictConfig) – DictConfig containing type of the processor to be initialized and params of that processor.

class mmf.datasets.processors.processors.SimpleSentenceProcessor(*args, **kwargs)[source]¶

Tokenizes a sentence and processes it.

tokenizer¶

Type of tokenizer to be used.

Type: function

class mmf.datasets.processors.processors.SimpleWordProcessor(*args, **kwargs)[source]¶

Tokenizes a word and processes it.

tokenizer¶

Type of tokenizer to be used.

Type: function

class mmf.datasets.processors.processors.SoftCopyAnswerProcessor(config, *args, **kwargs)[source]¶

Similar to Answer Processor but adds soft copy dynamic answer space to it. Read https://arxiv.org/abs/1904.08920 for extra information on soft copy and LoRRA.

Parameters: config (DictConfig) – Configuration for soft copy processor.

get_true_vocab_size()[source]¶

Actual vocab size which only include size of the vocabulary file.

Returns: Actual size of vocabs.
Return type: int

get_vocab_size()[source]¶

Size of Vocab + Size of Dynamic soft-copy based answer space

Returns: Size of vocab + size of dynamic soft-copy answer space.
Return type: int

class mmf.datasets.processors.processors.TransformerBboxProcessor(config, *args, **kwargs)[source]¶: Process a bounding box and returns a array of normalized bbox positions and area

class mmf.datasets.processors.processors.VQAAnswerProcessor(config, *args, **kwargs)[source]¶

Processor for generating answer scores for answers passed using VQA accuracy formula. Using VocabDict class to represent answer vocabulary, so parameters must specify “vocab_file”. “num_answers” in parameter config specify the max number of answers possible. Takes in dict containing “answers” or “answers_tokens”. “answers” are preprocessed to generate “answers_tokens” if passed.

Parameters: config (DictConfig) – Configuration for the processor

answer_vocab¶

Class representing answer vocabulary

Type: VocabDict

compute_answers_scores(answers_indices)[source]¶

Generate VQA based answer scores for answers_indices.

Parameters: answers_indices (torch.LongTensor) – tensor containing indices of the answers
Returns: tensor containing scores.
Return type: torch.FloatTensor

get_true_vocab_size()[source]¶

True vocab size can be different from normal vocab size in some cases such as soft copy where dynamic answer space is added.

Returns: True vocab size.
Return type: int

get_vocab_size()[source]¶

Get vocab size of the answer vocabulary. Can also include soft copy dynamic answer space size.

Returns: size of the answer vocabulary
Return type: int

idx2word(idx)[source]¶

Index to word according to the vocabulary.

Parameters: idx (int) – Index to be converted to the word.
Returns: Word corresponding to the index.
Return type: str

word2idx(word)[source]¶

Convert a word to its index according to vocabulary

Parameters: word (str) – Word to be converted to index.
Returns: Index of the word.
Return type: int

class mmf.datasets.processors.processors.VocabProcessor(config, *args, **kwargs)[source]¶

Use VocabProcessor when you have vocab file and you want to process words to indices. Expects UNK token as “<unk>” and pads sentences using “<pad>” token. Config parameters can have preprocessor property which is used to preprocess the item passed and max_length property which points to maximum length of the sentence/tokens which can be convert to indices. If the length is smaller, the sentence will be padded. Parameters for “vocab” are necessary to be passed.

Key: vocab

Example Config:

dataset_config:
  vqa2:
    data_dir: ${env.data_dir}
    processors:
      text_processor:
        type: vocab
        params:
          max_length: 14
          vocab:
            type: intersected
            embedding_name: glove.6B.300d
            vocab_file: vqa2/defaults/extras/vocabs/vocabulary_100k.txt

Parameters: config (DictConfig) – node containing configuration parameters of the processor

vocab¶

Vocab class object which is abstraction over the vocab file passed.

Type: Vocab

get_pad_index()[source]¶

Get index of padding <pad> token in vocabulary.

Returns: index of the padding token.
Return type: int

get_vocab_size()[source]¶

Get size of the vocabulary.

Returns: size of the vocabulary.
Return type: int

class mmf.datasets.processors.image_processors.GrayScaleTo3Channels(*args, **kwargs)[source]¶

class mmf.datasets.processors.image_processors.NormalizeBGR255(*args, **kwargs)[source]¶

class mmf.datasets.processors.image_processors.ResizeShortest(*args, **kwargs)[source]¶

class mmf.datasets.processors.image_processors.TorchvisionTransforms(config, *args, **kwargs)[source]¶

class mmf.datasets.processors.image_processors.VILTImageProcessor(config, *args, **kwargs)[source]¶

class mmf.datasets.processors.bert_processors.BertTokenizer(config, *args, **kwargs)[source]¶

class mmf.datasets.processors.bert_processors.MaskedRobertaTokenizer(config, *args, **kwargs)[source]¶

_convert_to_indices(tokens_a: List[str], tokens_b: Optional[List[str]] = None, probability: float = 0.15) → Dict[str, torch.Tensor][source]¶: Roberta encodes - single sequence: <s> X </s> - pair of sequences: <s> A </s> </s> B </s>

_truncate_seq_pair(tokens_a: List[str], tokens_b: List[str], max_length: int)[source]¶: Truncates a sequence pair in place to the maximum length.

class mmf.datasets.processors.bert_processors.MaskedTokenProcessor(config, *args, **kwargs)[source]¶

_convert_to_indices(tokens_a: List[str], tokens_b: Optional[List[str]] = None, probability: float = 0.15) → Dict[str, torch.Tensor][source]¶: BERT encodes - single sequence: [CLS] X [SEP] - pair of sequences: [CLS] A [SEP] B [SEP]

_truncate_seq_pair(tokens_a: List[str], tokens_b: List[str], max_length: int)[source]¶: Truncates a sequence pair in place to the maximum length.

class mmf.datasets.processors.bert_processors.MultiSentenceBertTokenizer(config, *args, **kwargs)[source]¶: Extension of BertTokenizer which supports multiple sentences. Separate from normal usecase, each sentence will be passed through bert tokenizer separately and indices will be reshaped as single tensor. Segment ids will also be increasing in number.

class mmf.datasets.processors.bert_processors.MultiSentenceRobertaTokenizer(config, *args, **kwargs)[source]¶: Extension of SPMTokenizer which supports multiple sentences. Similar to MultiSentenceBertTokenizer.

class mmf.datasets.processors.bert_processors.RobertaTokenizer(config, *args, **kwargs)[source]¶

class mmf.datasets.processors.bert_processors.VILTTextTokenizer(config, *args, **kwargs)[source]¶