This is a tutorial for running the BUTD model available in MMF. This model was released originally under this (repo). Please cite the following paper if you are using BUTD model from mmf:
- Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086). (arXiV)
Install MMF following the installation guide.
For training the BUTD model on COCO captions we use the Karpathy splits. Annotations and features for COCO will be automatically downloaded.
#Training and Evaluation
To train BUTD model on the COCO karpathy train split, run:
this will save the trained model
butd_final.pth in your
./save directory for the experiment.
To evaluate the trained model on the COCO val set, run:
BUTD evaluation can also be done with two other decoding variants with the same trained model, Beam Search and Nucleus Sampling. The following configs can be used :
- Beam Search Decoding (
- Nucleus Sampling Decoding (
To generate the coco captions prediction file for Karpathy
test splits, run:
Evaluation predictions can only be generated using either
|Datasets||Config File||Pretrained Model Key||Metrics|
|COCO (||val accuracy - 0.36 BLEU4|
To generate predictions with the pretrained BUTD model on COCO Karpathy
val set (assuming that the pretrained model that you are evaluating is
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF