This is a tutorial for running the MoViE+MCAN model which won the VQA Challenge at CVPR 2020. The winning team comprised of Nguyen, D. K., Jiang, H., Goswami, V., Yu. L. & Chen, X. MoViE+MCAN model is derived from the following papers, and is released under the MMF. Please cite both these papers if you use the model or the grid features used to train this model in your work:
- Nguyen, D. K., Goswami, V., & Chen, X. (2020). Revisiting Modulated Convolutions for Visual Counting and Beyond. arXiv preprint arXiv:2004.11883. (arXiV)
- Jiang, H., Misra, I., Rohrbach, M., Learned-Miller, E., & Chen, X. (2020). In Defense of Grid Features for Visual Question Answering. arXiv preprint arXiv:2001.03615. (arXiV)
Install MMF following the installation guide.
Annotations and features for VQA2.0 and VisualGenome will be automatically downloaded. The grid image features were extracted using the models trained in this repo. Other variants of features data available in that repo can also be used.
Training and Evaluation
To train MoViE+MCAN model on the VQA2.0 + Visual Genome dataset, run:
this will save the trained model
movie_mcan_final.pth in your
./save directory for the experiment.
To evaluate the trained model on the VQA2.0 val set, run:
Inference Prediction for Eval AI Submission
To generate the vqa prediction file for Eval AI submission on
|Datasets||Config File||Pretrained Model Key||Metrics||Notes|
|VQA2.0 (||testdev accuracy - 73.92%||Uses Visual Genome as extra data|
To generate predictions with the pretrained MoViE+MCAN model on VQA2.0
test-dev set (assuming that the pretrained model that you are evaluating is
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF