MMF is a modular framework for supercharging vision and language research built on top of PyTorch. Using MMF, researchers and devlopers can train custom models for VQA, Image Captioning, Visual Dialog, Hate Detection and other vision and language tasks.

Read docs for tutorials and documentation.


If you use MMF in your work or use any models published in MMF, please cite:

author =       {Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and
                Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},
title =        {MMF: A multimodal framework for vision and language research},
howpublished = {\url{}},
year =         {2020}

Indices and tables