BUTD
This is a tutorial for running the BUTD model available in MMF. This model was released originally under this (repo). Please cite the following paper if you are using BUTD model from mmf:
- Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086). (arXiV)
#
InstallationInstall MMF following the installation guide.
#
Data SetupFor training the BUTD model on COCO captions we use the Karpathy splits. Annotations and features for COCO will be automatically downloaded.
#
Training and EvaluationTo train BUTD model on the COCO karpathy train split, run:
this will save the trained model butd_final.pth
in your ./save
directory for the experiment.
To evaluate the trained model on the COCO val set, run:
BUTD evaluation can also be done with two other decoding variants with the same trained model, Beam Search and Nucleus Sampling. The following configs can be used :
- Beam Search Decoding (
projects/butd/configs/coco/beam_search.yaml
) - Nucleus Sampling Decoding (
projects/butd/configs/coco/nucleus_sampling.yaml
)
#
Inference PredictionTo generate the coco captions prediction file for Karpathy val
or test
splits, run:
note
Evaluation predictions can only be generated using either beam_search
or nucleus_sampling
methods.
#
Pretrained modelDatasets | Config File | Pretrained Model Key | Metrics | |
---|---|---|---|---|
COCO (coco ) | projects/butd/configs/coco/beam_search.yaml | butd | val accuracy - 0.36 BLEU4 |
To generate predictions with the pretrained BUTD model on COCO Karpathy val
set (assuming that the pretrained model that you are evaluating is butd
), run:
tip
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF