Processors can be thought of as torchvision transforms which transform a sample into a form usable by the model. Each processor takes in a dictionary and returns a dictionary. Processors are initialized as member variables of the dataset and can be used to preprocess samples in the proper format. Here is how processors work in mmf:
For this tutorial, we will create three different types of processors :
- a simple processor for text,
- a simple processor for images,
- a text processor by extending an existing vocabulary processor in mmf,
Create a simple Text Processor
Here we will create a simple processor that takes a sentence and returns a list of stripped word tokens.
We can add the processor's configuration to a dataset's config and will be available in the dataset class as
In this manner, processors can be added to any dataset.
Create an Image Processor
In this section, we will learn how to add an image processor. We will add a processor that converts any grayscale images to 3 channel image.
We will add the processor's configuration to the Hateful Memes dataset's config:
torchvision_transforms image processor loads the different transform processor like the
GrayScale one we created and composes them together as torchvision transforms. Here we are adding two transforms, first
ToTensor, which is a native torchvision transform to convert the image to a tensor and then the second
GrayScale which will convert a single channel to 3 channel image tensor. So these transforms will be applied to the images when
image_processor is used on an image from the dataset class.
Extending an existing processor: Create a fasttext sentence processor
fasttext processor is available in MMF that returns word embeddings. Here we will create a
fasttext sentence processor hereby extending the
fasttext word processor.
For this processor, we can similarly add the configuration to the a dataset's config and will be available in the dataset class as
Learn more about processors in the processors documentation.