This project page shows how to use M4C-Captioner model from the following paper, released under the MMF:
- O. Sidorov, R. Hu, M. Rohrbach, A. Singh, TextCaps: a Dataset for Image Captioning with Reading Comprehension. in ECCV, 2020 (PDF)
Project Page: https://textvqa.org/textcaps
Install MMF following the installation guide.
This will install all M4C dependencies such as
editdistance and will also compile the python interface for PHOC features.
In addition, it is also necessary to install
Note that java is required for
Pretrained M4C-Captioner Models
We release two variants of the M4C-Captioner model trained on the TextCaps dataset, one trained with newer features extracted with maskrcnn-benchmark (
defaults), and the other trained with older features extracted with Caffe2 (
with_caffe2_feat), which is used in our experimentations in the paper and has higher CIDEr. Please use
with_caffe2_feat config and model zoo file if you would like to exactly reproduce the results from our paper.
|Config Files (under ||Pretrained Model Key||Metrics||Notes|
|val CIDEr -- 89.1 (BLEU-4 -- 23.4)||newer features extracted with maskrcnn-benchmark|
|val CIDEr -- 89.6 (BLEU-4 -- 23.3)||older features extracted with Caffe2; used in experiments in the paper|
Training and Evaluating M4C-Captioner
Please follow the MMF documentation for the training and evaluation of the M4C-Captioner models.
1) to train the M4C-Captioner model on the TextCaps training set:
projects/m4c_captioner/configs/m4c_captioner/textcaps/defaults.yaml with other config files to train with other configurations. See the table above. You can also specify a different path to
env.save_dir to save to a location you prefer.)
2) to generate prediction json files for the TextCaps (assuming you are evaluating the pretrained model
Generate prediction file on the validation set:
Generate prediction file on the test set:
As with training, you can replace
checkpoint.resume_zoo according to the setting you want to evaluate.
checkpoint.resume_best=True instead of
checkpoint.resume_zoo=m4c_captioner.textcaps.defaults to evaluate your trained snapshots.
Follow checkpointing tutorial to understand more fine-grained details of checkpoint, loading and resuming in MMF
projects/m4c_captioner/scripts/textcaps_eval.py to evaluate the prediction json file. For example:
For test set evaluation, please submit to the TextCaps EvalAI server. See https://textvqa.org/textcaps for details.