Adding a Processor
Processors can be thought of as torchvision transforms which transform a sample into a form usable by the model. Each processor takes in a dictionary and returns a dictionary. Processors are initialized as member variables of the dataset and can be used to preprocess samples in the proper format. Here is how processors work in mmf:

For this tutorial, we will create three different types of processors :
- a simple processor for text,
- a simple processor for images,
- a text processor by extending an existing vocabulary processor in mmf,
Create a simple Text Processor#
Here we will create a simple processor that takes a sentence and returns a list of stripped word tokens.
We can add the processor's configuration to a dataset's config and will be available in the dataset class as text_processor variable:
In this manner, processors can be added to any dataset.
Create an Image Processor#
In this section, we will learn how to add an image processor. We will add a processor that converts any grayscale images to 3 channel image.
We will add the processor's configuration to the Hateful Memes dataset's config:
The torchvision_transforms image processor loads the different transform processor like the GrayScale one we created and composes them together as torchvision transforms. Here we are adding two transforms, first ToTensor, which is a native torchvision transform to convert the image to a tensor and then the second GrayScale which will convert a single channel to 3 channel image tensor. So these transforms will be applied to the images when image_processor is used on an image from the dataset class.
Extending an existing processor: Create a fasttext sentence processor#
A fasttext processor is available in MMF that returns word embeddings. Here we will create a fasttext sentence processor hereby extending the fasttext word processor.
For this processor, we can similarly add the configuration to the a dataset's config and will be available in the dataset class as text_processor :
Next Steps#
Learn more about processors in the processors documentation.