Adding a Processor
Processors can be thought of as torchvision transforms which transform a sample into a form usable by the model. Each processor takes in a dictionary and returns a dictionary. Processors are initialized as member variables of the dataset and can be used to preprocess samples in the proper format. Here is how processors work in mmf:
For this tutorial, we will create three different types of processors :
- a simple processor for text,
- a simple processor for images,
- a text processor by extending an existing vocabulary processor in mmf,
#
Create a simple Text ProcessorHere we will create a simple processor that takes a sentence and returns a list of stripped word tokens.
We can add the processor's configuration to a dataset's config and will be available in the dataset class as text_processor
variable:
In this manner, processors can be added to any dataset.
#
Create an Image ProcessorIn this section, we will learn how to add an image processor. We will add a processor that converts any grayscale images to 3 channel image.
We will add the processor's configuration to the Hateful Memes dataset's config:
The torchvision_transforms
image processor loads the different transform processor like the GrayScale
one we created and composes them together as torchvision transforms. Here we are adding two transforms, first ToTensor
, which is a native torchvision transform to convert the image to a tensor and then the second GrayScale
which will convert a single channel to 3 channel image tensor. So these transforms will be applied to the images when image_processor
is used on an image from the dataset class.
#
Extending an existing processor: Create a fasttext sentence processorA fasttext
processor is available in MMF that returns word embeddings. Here we will create a fasttext
sentence processor hereby extending the fasttext
word processor.
For this processor, we can similarly add the configuration to the a dataset's config and will be available in the dataset class as text_processor
:
#
Next StepsLearn more about processors in the processors documentation.