Quickstart
In this quickstart guide, we are going to train the M4C model on the TextVQA dataset. TextVQA requires models to read and reason about text in images to answer questions about them. M4C
is a recent SOTA model on TextVQA which consists of a multimodal transformer architecture accompanied by a rich representation for text in images.
To train other models or understand more about MMF, follow Next Steps at the bottom of this tutorial.
#
InstallationInstall MMF following the installation documentation.
#
Getting DataDatasets and required files will be downloaded automatically when we run training. For more details about custom datasets and other advanced setups for datasets check the dataset documentation.
#
TrainingWe can start training by running the following command:
The hyperparameters for training and for the experiment are in the experiment config projects/m4c/configs/textvqa/defaults.yaml
. We can also set config params using command line args:
where training.batch_size=32
will set batch size to 32 and training.max_updates=44000
will set max iterations to 44000 for the training.
Similarly, log interval, checkpoint interval and validation interval can all be set as:
This will show training logs every 10 iterations, checkpoint models every 100 iterations and run validation every 1000 iterations. More about configurations and how we set them can be found here.
#
InferenceFor running inference or generating predictions, we can specify a pretrained model using its zoo key and then run the following command:
For running inference on the val
set, use run_type=val
and rest of the arguments stay the same. checkpoint.resume_zoo
is loading a pretrained model from model zoo. To learn more about checkpoints and pretraining/finetuning models check this tutorial.
These commands should be enough to get you started with training and performing inference using MMF.
#
CitationIf you use MMF in your work or use any models published in MMF, please cite:
#
Next stepsTo dive deeper into MMF, explore the following topics next: