MMF is powered by PyTorch and features:
- Model Zoo: Reference implementations for state-of-the-art vision and language models including VisualBERT, ViLBERT, M4C (SoTA on TextVQA and TextCaps), Pythia (VQA 2018 challenge winner), and many others. See the full list of projects in MMF here.
- Multi-Tasking: Support for training on multiple datasets together.
- Datasets: Includes built-in support for various datasets including VQA, VizWiz, TextVQA, Visual Dialog and COCO Captioning. Running a single command automatically downloads and sets up the dataset for you.
- Modules: Provides implementations of many commonly used layers in vision and language.
- Distributed: Support for distributed training using DistributedDataParallel. With our hyperparameter sweep support, you can scale your model to any number of nodes.
- Unopinionated: Unopinionated about the dataset and model implementations built on top of it.
- Customization: Custom losses, metrics, scheduling, optimizers, tensorboard; suits all your custom needs.