BUTD

This is a tutorial for running the BUTD model available in MMF. This model was released originally under this (repo). Please cite the following paper if you are using BUTD model from mmf:

  • Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086). (arXiV)
@inproceedings{Anderson2017up-down,
author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
booktitle={CVPR},
year = {2018}
}

Installation

Install MMF following the installation guide.

Data Setup

For training the BUTD model on COCO captions we use the Karpathy splits. Annotations and features for COCO will be automatically downloaded.

Training and Evaluation

To train BUTD model on the COCO karpathy train split, run:

mmf_run config=projects/butd/configs/coco/defaults.yaml \
model=butd \
dataset=coco \
run_type=train

this will save the trained model butd_final.pth in your ./save directory for the experiment.

To evaluate the trained model on the COCO val set, run:

mmf_run config=projects/butd/configs/coco/defaults.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_file=<path_to_trained_pth_file>

BUTD evaluation can also be done with two other decoding variants with the same trained model, Beam Search and Nucleus Sampling. The following configs can be used :

  • Beam Search Decoding (projects/butd/configs/coco/beam_search.yaml)
  • Nucleus Sampling Decoding (projects/butd/configs/coco/nucleus_sampling.yaml)
mmf_run config=projects/butd/configs/coco/beam_search.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_file=<path_to_trained_pth_file>

Inference Prediction

To generate the coco captions prediction file for Karpathy val or test splits, run:

mmf_predict config=projects/butd/configs/coco/beam_search.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_file=<path_to_trained_pth_file>
note

Evaluation predictions can only be generated using either beam_search or nucleus_sampling methods.

Pretrained model

DatasetsConfig FilePretrained Model KeyMetrics
COCO (coco)projects/butd/configs/coco/beam_search.yamlbutdval accuracy - 0.36 BLEU4

To generate predictions with the pretrained BUTD model on COCO Karpathy val set (assuming that the pretrained model that you are evaluating is butd), run:

mmf_predict config=projects/butd/configs/coco/beam_search.yaml \
model=butd \
dataset=coco \
run_type=val \
checkpoint.resume_zoo=butd
tip

Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF

Last updated on by Vedanuj Goswami