MMF Projects

MMF contains references implementations or has been used to develop following projects (in no particular order):

  • Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA [arXiv] [project]
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks [arXiv] [project]
  • Visualbert: A simple and performant baseline for vision and language [arXiv] [project]
  • The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes [arXiv] [project]
  • Towards VQA Models That Can Read [arXiv] [project]
  • TextCaps: a Dataset for Image Captioning with Reading Comprehension [arXiv] [project]
  • Pythia v0. 1: the winning entry to the vqa challenge 2018 [arXiv] [project]
  • Bottom-up and top-down attention for image captioning and visual question answering [arXiv] [project]
  • Supervised Multimodal Bitransformers for Classifying Images and Text [arXiv] [project]
  • Are we pretraining it right? Digging deeper into visio-linguistic pretraining [arXiv][project]
Last updated on by Amanpreet Singh