datasets.processors¶
The processors exist in MMF to make data processing pipelines in various datasets as similar as possible while allowing code reuse.
The processors also help maintain proper abstractions to keep only what matters
inside the dataset’s code. This allows us to keep the dataset __getitem__
logic really clean and no need about maintaining opinions about data type.
Processors can work on both images and text due to their generic structure.
To create a new processor, follow these steps:
Inherit the
BaseProcessor
class.Implement
_call
function which takes in a dict and returns a dict with same keys preprocessed as well as any extra keys that need to be returned.Register the processor using
@registry.register_processor('name')
to registry where ‘name’ will be used to refer to your processor later.
In processor’s config you can specify preprocessor
option to specify
different kind of preprocessors you want in your dataset.
Let’s break down processor’s config inside a dataset (VQA2.0) a bit to understand different moving parts.
Config:
dataset_config:
vqa2:
data_dir: ${env.data_dir}
processors:
text_processor:
type: vocab
params:
max_length: 14
vocab:
type: intersected
embedding_name: glove.6B.300d
vocab_file: vqa2/defaults/extras/vocabs/vocabulary_100k.txt
preprocessor:
type: simple_sentence
params: {}
BaseDataset
will init the processors and they will available inside your
dataset with same attribute name as the key name, for e.g. text_processor will
be available as self.text_processor inside your dataset. As is with every module
in MMF, processor also accept a DictConfig
with a type and params
attributes. params defined the custom parameters for each of the processors.
By default, processor initialization process will also init preprocessor attribute
which can be a processor config in itself. preprocessor can be then be accessed
inside the processor’s functions.
Example:
from mmf.common.registry import registry
from mmf.datasets.processors import BaseProcessor
@registry.register_processor('my_processor')
class MyProcessor(BaseProcessor):
def __init__(self, config, *args, **kwargs):
return
def __call__(self, item, *args, **kwargs):
text = item['text']
text = [t.strip() for t in text.split(" ")]
return {"text": text}
- class mmf.datasets.processors.processors.BBoxProcessor(config, *args, **kwargs)[source]¶
Generates bboxes in proper format. Takes in a dict which contains “info” key which is a list of dicts containing following for each of the the bounding box
Example bbox input:
{ "info": [ { "bounding_box": { "top_left_x": 100, "top_left_y": 100, "width": 200, "height": 300 } }, ... ] }
This will further return a Sample in a dict with key “bbox” with last dimension of 4 corresponding to “xyxy”. So sample will look like following:
Example Sample:
Sample({ "coordinates": torch.Size(n, 4), "width": List[number], # size n "height": List[number], # size n "bbox_types": List[str] # size n, either xyxy or xywh. # currently only supports xyxy. })
- class mmf.datasets.processors.processors.BaseProcessor(*args, config: Optional[omegaconf.dictconfig.DictConfig] = None, **kwargs)[source]¶
Every processor in MMF needs to inherit this class for compatibility with MMF. End user mainly needs to implement
__call__
function.- Parameters
config (DictConfig) – Config for this processor, containing type and params attributes if available.
- class mmf.datasets.processors.processors.BatchProcessor(config: mmf.datasets.processors.processors.BatchProcessorConfigType, *args, **kwargs)[source]¶
BatchProcessor is an extension of normal processor which usually are used in cases where dataset works on full batch instead of samples. Such cases can be observed in the case of the iterable datasets. BatchProcessor if provided with processors key in the config, will initialize a member variable processors_dict for you which will contain initialization of all of the processors you specified and will need to process your complete batch.
Rest it behaves in same way, expects an item and returns an item which can be of any type.
- class mmf.datasets.processors.processors.BatchProcessorConfigType(processors: mmf.common.typings.ProcessorConfigType)[source]¶
- class mmf.datasets.processors.processors.CaptionProcessor(config, *args, **kwargs)[source]¶
Processes a caption with start, end and pad tokens and returns raw string.
- Parameters
config (DictConfig) – Configuration for caption processor.
- class mmf.datasets.processors.processors.CopyProcessor(config, *args, **kwargs)[source]¶
Copy boxes from numpy array
- class mmf.datasets.processors.processors.DETRImageAndTargetProcessor(config, *args, **kwargs)[source]¶
Process a detection image and target in consistent with DETR. At training time, random crop is done. At test time, an image is deterministically resized with short side equal to image_size (while ensuring its long side no larger than max_size)
- class mmf.datasets.processors.processors.EvalAIAnswerProcessor(*args, **kwargs)[source]¶
Processes an answer similar to Eval AI
- class mmf.datasets.processors.processors.FastTextProcessor(config, *args, **kwargs)[source]¶
FastText processor, similar to GloVe processor but returns FastText vectors.
- Parameters
config (DictConfig) – Configuration values for the processor.
- class mmf.datasets.processors.processors.GloVeProcessor(config, *args, **kwargs)[source]¶
Inherits VocabProcessor, and returns GloVe vectors for each of the words. Maps them to index using vocab processor, and then gets GloVe vectors corresponding to those indices.
- Parameters
config (DictConfig) – Configuration parameters for GloVe same as
VocabProcessor()
.
- class mmf.datasets.processors.processors.GraphVQAAnswerProcessor(config, *args, **kwargs)[source]¶
Processor for generating answer scores for answers passed using VQA accuracy formula. Using VocabDict class to represent answer vocabulary, so parameters must specify “vocab_file”. “num_answers” in parameter config specify the max number of answers possible. Takes in dict containing “answers” or “answers_tokens”. “answers” are preprocessed to generate “answers_tokens” if passed.
This version also takes a graph vocab and predicts a main and graph stream simultanously
- Parameters
config (DictConfig) – Configuration for the processor
- answer_vocab¶
Class representing answer vocabulary
- Type
VocabDict
- compute_answers_scores(answers_indices)[source]¶
Generate VQA based answer scores for answers_indices.
- Parameters
answers_indices (torch.LongTensor) – tensor containing indices of the answers
- Returns
tensor containing scores.
- Return type
torch.FloatTensor
- get_true_vocab_size()[source]¶
True vocab size can be different from normal vocab size in some cases such as soft copy where dynamic answer space is added.
- Returns
True vocab size.
- Return type
int
- get_vocab_size()[source]¶
Get vocab size of the answer vocabulary. Can also include soft copy dynamic answer space size.
- Returns
size of the answer vocabulary
- Return type
int
- class mmf.datasets.processors.processors.M4CAnswerProcessor(config, *args, **kwargs)[source]¶
Process a TextVQA answer for iterative decoding in M4C
- class mmf.datasets.processors.processors.MaskedRegionProcessor(config, *args, **kwargs)[source]¶
Masks a region with probability mask_probability
- class mmf.datasets.processors.processors.MultiClassFromFile(config: mmf.datasets.processors.processors.MultiClassFromFileConfig, *args, **kwargs)[source]¶
Label processor for multi class cases where the labels are saved in a file.
- class mmf.datasets.processors.processors.MultiHotAnswerFromVocabProcessor(config, *args, **kwargs)[source]¶
- class mmf.datasets.processors.processors.PhocProcessor(config, *args, **kwargs)[source]¶
Compute PHOC features from text tokens
- class mmf.datasets.processors.processors.Processor(config: mmf.common.typings.ProcessorConfigType, *args, **kwargs)[source]¶
Wrapper class used by MMF to initialized processor based on their
type
as passed in configuration. It retrieves the processor class registered in registry corresponding to thetype
key and initializes withparams
passed in configuration. All functions and attributes of the processor initialized are directly available via this class.- Parameters
config (DictConfig) – DictConfig containing
type
of the processor to be initialized andparams
of that processor.
- class mmf.datasets.processors.processors.SimpleSentenceProcessor(*args, **kwargs)[source]¶
Tokenizes a sentence and processes it.
- tokenizer¶
Type of tokenizer to be used.
- Type
function
- class mmf.datasets.processors.processors.SimpleWordProcessor(*args, **kwargs)[source]¶
Tokenizes a word and processes it.
- tokenizer¶
Type of tokenizer to be used.
- Type
function
- class mmf.datasets.processors.processors.SoftCopyAnswerProcessor(config, *args, **kwargs)[source]¶
Similar to Answer Processor but adds soft copy dynamic answer space to it. Read https://arxiv.org/abs/1904.08920 for extra information on soft copy and LoRRA.
- Parameters
config (DictConfig) – Configuration for soft copy processor.
- class mmf.datasets.processors.processors.TransformerBboxProcessor(config, *args, **kwargs)[source]¶
Process a bounding box and returns a array of normalized bbox positions and area
- class mmf.datasets.processors.processors.VQAAnswerProcessor(config, *args, **kwargs)[source]¶
Processor for generating answer scores for answers passed using VQA accuracy formula. Using VocabDict class to represent answer vocabulary, so parameters must specify “vocab_file”. “num_answers” in parameter config specify the max number of answers possible. Takes in dict containing “answers” or “answers_tokens”. “answers” are preprocessed to generate “answers_tokens” if passed.
- Parameters
config (DictConfig) – Configuration for the processor
- answer_vocab¶
Class representing answer vocabulary
- Type
VocabDict
- compute_answers_scores(answers_indices)[source]¶
Generate VQA based answer scores for answers_indices.
- Parameters
answers_indices (torch.LongTensor) – tensor containing indices of the answers
- Returns
tensor containing scores.
- Return type
torch.FloatTensor
- get_true_vocab_size()[source]¶
True vocab size can be different from normal vocab size in some cases such as soft copy where dynamic answer space is added.
- Returns
True vocab size.
- Return type
int
- get_vocab_size()[source]¶
Get vocab size of the answer vocabulary. Can also include soft copy dynamic answer space size.
- Returns
size of the answer vocabulary
- Return type
int
- class mmf.datasets.processors.processors.VocabProcessor(config, *args, **kwargs)[source]¶
Use VocabProcessor when you have vocab file and you want to process words to indices. Expects UNK token as “<unk>” and pads sentences using “<pad>” token. Config parameters can have
preprocessor
property which is used to preprocess the item passed andmax_length
property which points to maximum length of the sentence/tokens which can be convert to indices. If the length is smaller, the sentence will be padded. Parameters for “vocab” are necessary to be passed.Key: vocab
Example Config:
dataset_config: vqa2: data_dir: ${env.data_dir} processors: text_processor: type: vocab params: max_length: 14 vocab: type: intersected embedding_name: glove.6B.300d vocab_file: vqa2/defaults/extras/vocabs/vocabulary_100k.txt
- Parameters
config (DictConfig) – node containing configuration parameters of the processor
- vocab¶
Vocab class object which is abstraction over the vocab file passed.
- Type
Vocab
- class mmf.datasets.processors.image_processors.TorchvisionTransforms(config, *args, **kwargs)[source]¶
- class mmf.datasets.processors.bert_processors.MaskedRobertaTokenizer(config, *args, **kwargs)[source]¶
- class mmf.datasets.processors.bert_processors.MaskedTokenProcessor(config, *args, **kwargs)[source]¶
- class mmf.datasets.processors.bert_processors.MultiSentenceBertTokenizer(config, *args, **kwargs)[source]¶
Extension of BertTokenizer which supports multiple sentences. Separate from normal usecase, each sentence will be passed through bert tokenizer separately and indices will be reshaped as single tensor. Segment ids will also be increasing in number.