This is a tutorial for running the BUTD model available in MMF. This model was released originally under this (repo). Please cite the following paper if you are using BUTD model from mmf:
- Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086). (arXiV)
Install MMF following the installation guide.
For training the BUTD model on COCO captions we use the Karpathy splits. Annotations and features for COCO will be automatically downloaded.
Training and Evaluation
To train BUTD model on the COCO karpathy train split, run:
this will save the trained model
butd_final.pth in your
./save directory for the experiment.
To evaluate the trained model on the COCO val set, run:
BUTD evaluation can also be done with two other decoding variants with the same trained model, Beam Search and Nucleus Sampling. The following configs can be used :
- Beam Search Decoding (
- Nucleus Sampling Decoding (
To generate the coco captions prediction file for Karpathy
test splits, run:
Evaluation predictions can only be generated using either
|Datasets||Config File||Pretrained Model Key||Metrics|
|COCO (||val accuracy - 0.36 BLEU4|
To generate predictions with the pretrained BUTD model on COCO Karpathy
val set (assuming that the pretrained model that you are evaluating is
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF