MoViE+MCAN (VQA 2020 Challenge Winner)
This is a tutorial for running the MoViE+MCAN model which won the VQA Challenge at CVPR 2020. The winning team comprised of Nguyen, D. K., Jiang, H., Goswami, V., Yu. L. & Chen, X. MoViE+MCAN model is derived from the following papers, and is released under the MMF. Please cite both these papers if you use the model or the grid features used to train this model in your work:
- Nguyen, D. K., Goswami, V., & Chen, X. (2020). Revisiting Modulated Convolutions for Visual Counting and Beyond. arXiv preprint arXiv:2004.11883. (arXiV)
and
- Jiang, H., Misra, I., Rohrbach, M., Learned-Miller, E., & Chen, X. (2020). In Defense of Grid Features for Visual Question Answering. arXiv preprint arXiv:2001.03615. (arXiV)
#
InstallationInstall MMF following the installation guide.
#
Data SetupAnnotations and features for VQA2.0 and VisualGenome will be automatically downloaded. The grid image features were extracted using the models trained in this repo. Other variants of features data available in that repo can also be used.
#
Training and EvaluationTo train MoViE+MCAN model on the VQA2.0 + Visual Genome dataset, run:
this will save the trained model movie_mcan_final.pth
in your ./save
directory for the experiment.
To evaluate the trained model on the VQA2.0 val set, run:
#
Inference Prediction for Eval AI SubmissionTo generate the vqa prediction file for Eval AI submission on test-dev
, run:
#
Pretrained modelDatasets | Config File | Pretrained Model Key | Metrics | Notes |
---|---|---|---|---|
VQA2.0 (vqa2 ) | projects/movie_mcan/configs/vqa2/defaults.yaml | movie_mcan.grid.vqa2_vg | testdev accuracy - 73.92% | Uses Visual Genome as extra data |
To generate predictions with the pretrained MoViE+MCAN model on VQA2.0 test-dev
set (assuming that the pretrained model that you are evaluating is movie_mcan.grid.vqa2_vg
), run:
tip
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF