BUTD
This is a tutorial for running the BUTD model available in MMF. This model was released originally under this (repo). Please cite the following paper if you are using BUTD model from mmf:
- Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086). (arXiV)
Installation#
Install MMF following the installation guide.
Data Setup#
For training the BUTD model on COCO captions we use the Karpathy splits. Annotations and features for COCO will be automatically downloaded.
Training and Evaluation#
To train BUTD model on the COCO karpathy train split, run:
this will save the trained model butd_final.pth in your ./save directory for the experiment.
To evaluate the trained model on the COCO val set, run:
BUTD evaluation can also be done with two other decoding variants with the same trained model, Beam Search and Nucleus Sampling. The following configs can be used :
- Beam Search Decoding (
projects/butd/configs/coco/beam_search.yaml) - Nucleus Sampling Decoding (
projects/butd/configs/coco/nucleus_sampling.yaml)
Inference Prediction#
To generate the coco captions prediction file for Karpathy val or test splits, run:
note
Evaluation predictions can only be generated using either beam_search or nucleus_sampling methods.
Pretrained model#
| Datasets | Config File | Pretrained Model Key | Metrics | |
|---|---|---|---|---|
COCO (coco) | projects/butd/configs/coco/beam_search.yaml | butd | val accuracy - 0.36 BLEU4 |
To generate predictions with the pretrained BUTD model on COCO Karpathy val set (assuming that the pretrained model that you are evaluating is butd), run:
tip
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF