TextCaps: a Dataset for Image Captioning with Reading Comprehension
This project page shows how to use M4C-Captioner model from the following paper, released under the MMF:
- O. Sidorov, R. Hu, M. Rohrbach, A. Singh, TextCaps: a Dataset for Image Captioning with Reading Comprehension. in ECCV, 2020 (PDF)
Project Page: https://textvqa.org/textcaps
#
InstallationInstall MMF following the installation guide.
This will install all M4C dependencies such as transformers
and editdistance
and will also compile the python interface for PHOC features.
In addition, it is also necessary to install pycocoevalcap
:
Note that java is required for pycocoevalcap
.
#
Pretrained M4C-Captioner ModelsWe release two variants of the M4C-Captioner model trained on the TextCaps dataset, one trained with newer features extracted with maskrcnn-benchmark (defaults
), and the other trained with older features extracted with Caffe2 (with_caffe2_feat
), which is used in our experimentations in the paper and has higher CIDEr. Please use with_caffe2_feat
config and model zoo file if you would like to exactly reproduce the results from our paper.
Config Files (under projects/m4c_captioner/configs/m4c_captioner/textcaps ) | Pretrained Model Key | Metrics | Notes |
---|---|---|---|
defaults.yaml | m4c_captioner.textcaps.defaults | val CIDEr -- 89.1 (BLEU-4 -- 23.4) | newer features extracted with maskrcnn-benchmark |
with_caffe2_feat.yaml | m4c_captioner.textcaps.with_caffe2_feat | val CIDEr -- 89.6 (BLEU-4 -- 23.3) | older features extracted with Caffe2; used in experiments in the paper |
#
Training and Evaluating M4C-CaptionerPlease follow the MMF documentation for the training and evaluation of the M4C-Captioner models.
For example:
1) to train the M4C-Captioner model on the TextCaps training set:
(Replace projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml
with other config files to train with other configurations. See the table above. You can also specify a different path to env.save_dir
to save to a location you prefer.)
2) to generate prediction json files for the TextCaps (assuming you are evaluating the pretrained model m4c_captioner.textcaps.defaults
):
Generate prediction file on the validation set:
Generate prediction file on the test set:
As with training, you can replace config
and checkpoint.resume_zoo
according to the setting you want to evaluate.
note
Use checkpoint.resume=True
AND checkpoint.resume_best=True
instead of checkpoint.resume_zoo=m4c_captioner.textcaps.defaults
to evaluate your trained snapshots.
tip
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF
Afterwards, use projects/m4c_captioner/scripts/textcaps_eval.py
to evaluate the prediction json file. For example:
For test set evaluation, please submit to the TextCaps EvalAI server. See https://textvqa.org/textcaps for details.