BUTD

This is a tutorial for running the BUTD model available in MMF. This model was released originally under this (repo). Please cite the following paper if you are using BUTD model from mmf:

  • Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086). (arXiV)
@inproceedings{Anderson2017up-down,
author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
booktitle={CVPR},
year = {2018}
}

Installation#

Install MMF following the installation guide.

Data Setup#

For training the BUTD model on COCO captions we use the Karpathy splits. Annotations and features for COCO will be automatically downloaded.

Training and Evaluation#

To train BUTD model on the COCO karpathy train split, run:

mmf_run config=projects/butd/configs/coco/defaults.yaml \
model=butd \
dataset=coco \
run_type=train

this will save the trained model butd_final.pth in your ./save directory for the experiment.

To evaluate the trained model on the COCO val set, run:

mmf_run config=projects/butd/configs/coco/defaults.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_file=<path_to_trained_pth_file>

BUTD evaluation can also be done with two other decoding variants with the same trained model, Beam Search and Nucleus Sampling. The following configs can be used :

  • Beam Search Decoding (projects/butd/configs/coco/beam_search.yaml)
  • Nucleus Sampling Decoding (projects/butd/configs/coco/nucleus_sampling.yaml)
mmf_run config=projects/butd/configs/coco/beam_search.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_file=<path_to_trained_pth_file>

Inference Prediction#

To generate the coco captions prediction file for Karpathy val or test splits, run:

mmf_predict config=projects/butd/configs/coco/beam_search.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_file=<path_to_trained_pth_file>
note

Evaluation predictions can only be generated using either beam_search or nucleus_sampling methods.

Pretrained model#

DatasetsConfig FilePretrained Model KeyMetrics
COCO (coco)projects/butd/configs/coco/beam_search.yamlbutdval accuracy - 0.36 BLEU4

To generate predictions with the pretrained BUTD model on COCO Karpathy val set (assuming that the pretrained model that you are evaluating is butd), run:

mmf_predict config=projects/butd/configs/coco/beam_search.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_zoo=butd
tip

Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF

Last updated on by Vedanuj Goswami