MMF Projects
MMF contains references implementations or has been used to develop following projects (in no particular order):
- Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA [arXiv] [project]
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [arXiv] [project]
- Visualbert: A simple and performant baseline for vision and language [arXiv] [project]
- The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [arXiv] [project]
- Towards VQA Models That Can Read [arXiv] [project]
- TextCaps: a Dataset for Image Captioning with Reading Comprehension [arXiv] [project]
- Pythia v0. 1: the winning entry to the vqa challenge 2018 [arXiv] [project]
- Bottom-up and top-down attention for image captioning and visual question answering [arXiv] [project]
- Supervised Multimodal Bitransformers for Classifying Images and Text [arXiv] [project]
- Are we pretraining it right? Digging deeper into visio-linguistic pretraining [arXiv][project]