modules.metrics¶

The metrics module contains implementations of various metrics used commonly to understand how well our models are performing. For e.g. accuracy, vqa_accuracy, r@1 etc.

For implementing your own metric, you need to follow these steps:

Create your own metric class and inherit BaseMetric class.
In the __init__ function of your class, make sure to call super().__init__('name') where ‘name’ is the name of your metric. If you require any parameters in your __init__ function, you can use keyword arguments to represent them and metric constructor will take care of providing them to your class from config.
Implement a calculate function which takes in SampleList and model_output as input and return back a float tensor/number.
Register your metric with a key ‘name’ by using decorator, @registry.register_metric('name').

Example:

import torch

from mmf.common.registry import registry
from mmf.modules.metrics import BaseMetric

@registry.register_metric("some")
class SomeMetric(BaseMetric):
    def __init__(self, some_param=None):
        super().__init__("some")
        ....

    def calculate(self, sample_list, model_output):
        metric = torch.tensor(2, dtype=torch.float)
        return metric

Example config for above metric:

model_config:
    pythia:
        metrics:
        - type: some
          params:
            some_param: a

class mmf.modules.metrics.Accuracy(score_key='scores', target_key='targets', topk=1)[source]¶

Metric for calculating accuracy.

Key: accuracy

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate accuracy and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

accuracy.

Return type

torch.FloatTensor

class mmf.modules.metrics.AveragePrecision(*args, **kwargs)[source]¶

Metric for calculating Average Precision. See more details at sklearn.metrics.average_precision_score # noqa If you are looking for binary case, please take a look at binary_ap Key: ap

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate AP and returns it back. The function performs softmax on the logits provided and then calculated the AP.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
model_output (Dict) – Dict returned by model. This should contain “scores” field pointing to logits returned from the model.

Returns

AP.

Return type

torch.FloatTensor

class mmf.modules.metrics.BaseMetric(name, *args, **kwargs)[source]¶

Base class to be inherited by all metrics registered to MMF. See the description on top of the file for more information. Child class must implement calculate function.

Parameters: name (str) – Name of the metric.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.BinaryAP(*args, **kwargs)[source]¶

Metric for calculating Binary Average Precision. See more details at sklearn.metrics.average_precision_score # noqa Key: binary_ap

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Binary AP and returns it back. The function performs softmax on the logits provided and then calculated the binary AP.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
model_output (Dict) – Dict returned by model. This should contain “scores” field pointing to logits returned from the model.

Returns

AP.

Return type

torch.FloatTensor

class mmf.modules.metrics.BinaryF1(*args, **kwargs)[source]¶

Metric for calculating Binary F1.

Key: binary_f1

class mmf.modules.metrics.BinaryF1PrecisionRecall(*args, **kwargs)[source]¶

Metric for calculating Binary F1 Precision and Recall.

Key: binary_f1_precision_recall

class mmf.modules.metrics.CaptionBleu4Metric[source]¶

Metric for calculating caption accuracy using BLEU4 Score.

Key: caption_bleu4

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate accuracy and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

bleu4 score.

Return type

torch.FloatTensor

class mmf.modules.metrics.DetectionMeanAP(dataset_json_files, *args, **kwargs)[source]¶

Metric for calculating the detection mean average precision (mAP) using the COCO evaluation toolkit, returning the default COCO-style mAP@IoU=0.50:0.95

Key: detection_mean_ap

calculate(sample_list, model_output, execute_on_master_only=True, *args, **kwargs)[source]¶

Calculate detection mean AP (mAP) from the prediction list and the dataset annotations. The function returns COCO-style mAP@IoU=0.50:0.95.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
model_output (Dict) – Dict returned by model. This should contain “prediction_report” field, which is a list of detection predictions from the model.
execute_on_master_only (bool) – Whether to only run mAP evaluation on the master node over the gathered detection prediction (to avoid wasting computation and CPU OOM). Default: True (only run mAP evaluation on master).

Returns

COCO-style mAP@IoU=0.50:0.95.

Return type

torch.FloatTensor

class mmf.modules.metrics.F1(*args, **kwargs)[source]¶

Metric for calculating F1. Can be used with type and params argument for customization. params will be directly passed to sklearn f1 function. Key: f1

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate f1 and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

f1.

Return type

torch.FloatTensor

class mmf.modules.metrics.F1PrecisionRecall(*args, **kwargs)[source]¶

Metric for calculating F1 precision and recall. params will be directly passed to sklearn precision_recall_fscore_support function. Key: f1_precision_recall

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate f1_precision_recall and return it back as a dict.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

Dict(: ‘f1’: torch.FloatTensor, ‘precision’: torch.FloatTensor, ‘recall’: torch.FloatTensor

)

class mmf.modules.metrics.MacroAP(*args, **kwargs)[source]¶

Metric for calculating Macro Average Precision.

Key: macro_ap

class mmf.modules.metrics.MacroF1(*args, **kwargs)[source]¶

Metric for calculating Macro F1.

Key: macro_f1

class mmf.modules.metrics.MacroF1PrecisionRecall(*args, **kwargs)[source]¶

Metric for calculating Macro F1 Precision and Recall.

Key: macro_f1_precision_recall

class mmf.modules.metrics.MacroROC_AUC(*args, **kwargs)[source]¶

Metric for calculating Macro ROC_AUC.

Key: macro_roc_auc

class mmf.modules.metrics.MeanRank[source]¶

Calculate MeanRank which specifies what was the average rank of the chosen candidate.

Key: mean_r.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Mean Rank and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

mean rank

Return type

torch.FloatTensor

class mmf.modules.metrics.MeanReciprocalRank[source]¶

Calculate reciprocal of mean rank..

Key: mean_rr.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Mean Reciprocal Rank and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

Mean Reciprocal Rank

Return type

torch.FloatTensor

class mmf.modules.metrics.Metrics(metric_list)[source]¶

Internally used by MMF, Metrics acts as wrapper for handling calculation of metrics over various metrics specified by the model in the config. It initializes all of the metrics and when called it runs calculate on each of them one by one and returns back a dict with proper naming back. For e.g. an example dict returned by Metrics class: {'val/vqa_accuracy': 0.3, 'val/r@1': 0.8}

Parameters: metric_list (ListConfig) – List of DictConfigs where each DictConfig specifies name and parameters of the metrics used.

class mmf.modules.metrics.MicroAP(*args, **kwargs)[source]¶

Metric for calculating Micro Average Precision.

Key: micro_ap

class mmf.modules.metrics.MicroF1(*args, **kwargs)[source]¶

Metric for calculating Micro F1.

Key: micro_f1

class mmf.modules.metrics.MicroF1PrecisionRecall(*args, **kwargs)[source]¶

Metric for calculating Micro F1 Precision and Recall.

Key: micro_f1_precision_recall

class mmf.modules.metrics.MicroROC_AUC(*args, **kwargs)[source]¶

Metric for calculating Micro ROC_AUC.

Key: micro_roc_auc

class mmf.modules.metrics.MultiLabelF1(*args, **kwargs)[source]¶

Metric for calculating Multilabel F1.

Key: multilabel_f1

class mmf.modules.metrics.MultiLabelMacroF1(*args, **kwargs)[source]¶

Metric for calculating Multilabel Macro F1.

Key: multilabel_macro_f1

class mmf.modules.metrics.MultiLabelMicroF1(*args, **kwargs)[source]¶

Metric for calculating Multilabel Micro F1.

Key: multilabel_micro_f1

class mmf.modules.metrics.OCRVQAAccuracy[source]¶

class mmf.modules.metrics.ROC_AUC(*args, **kwargs)[source]¶

Metric for calculating ROC_AUC. See more details at sklearn.metrics.roc_auc_score # noqa

Note: ROC_AUC is not defined when expected tensor only contains one label. Make sure you have both labels always or use it on full val only

Key: roc_auc

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate ROC_AUC and returns it back. The function performs softmax on the logits provided and then calculated the ROC_AUC.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
model_output (Dict) – Dict returned by model. This should contain “scores” field pointing to logits returned from the model.

Returns

ROC_AUC.

Return type

torch.FloatTensor

class mmf.modules.metrics.RecallAt1[source]¶

Calculate Recall@1 which specifies how many time the chosen candidate was rank 1.

Key: r@1.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Recall@1 and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

Recall@1

Return type

torch.FloatTensor

class mmf.modules.metrics.RecallAt10[source]¶

Calculate Recall@10 which specifies how many time the chosen candidate was among first 10 ranks.

Key: r@10.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Recall@10 and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

Recall@10

Return type

torch.FloatTensor

class mmf.modules.metrics.RecallAt10_ret[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAt10_rev_ret[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAt1_ret[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAt1_rev_ret[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAt5[source]¶

Calculate Recall@5 which specifies how many time the chosen candidate was among first 5 rank.

Key: r@5.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Recall@5 and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

Recall@5

Return type

torch.FloatTensor

class mmf.modules.metrics.RecallAt5_ret[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAt5_rev_ret[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAtK(name='recall@k')[source]¶

calculate(sample_list, model_output, k, *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAtK_ret(name='recall@k')[source]¶

calculate(sample_list: Dict[str, torch.Tensor], model_output: Dict[str, torch.Tensor], k: int, flip=False, *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.RecallAtPrecisionK(p_threshold, *args, **kwargs)[source]¶

Metric for calculating recall when precision is above a particular threshold. Use p_threshold param to specify the precision threshold i.e. k. Accepts precision in both 0-1 and 1-100 format.

Key: r@pk

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate Recall at precision k and returns it back. The function performs softmax on the logits provided and then calculated the metric.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration.
model_output (Dict) – Dict returned by model. This should contain “scores” field pointing to logits returned from the model.

Returns

Recall @ precision k.

Return type

torch.FloatTensor

class mmf.modules.metrics.STVQAANLS[source]¶

class mmf.modules.metrics.STVQAAccuracy[source]¶

class mmf.modules.metrics.TextCapsBleu4[source]¶

class mmf.modules.metrics.TextVQAAccuracy[source]¶

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Abstract method to be implemented by the child class. Takes in a SampleList and a dict returned by model as output and returns back a float tensor/number indicating value for this metric.

Parameters

sample_list (SampleList) – SampleList provided by the dataloader for the current iteration.
model_output (Dict) – Output dict from the model for the current SampleList

Returns

Value of the metric.

Return type

torch.Tensor|float

class mmf.modules.metrics.TopKAccuracy(score_key: str, k: int)[source]¶

class mmf.modules.metrics.VQAAccuracy[source]¶

Calculate VQAAccuracy. Find more information here

Key: vqa_accuracy.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate vqa accuracy and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

VQA Accuracy

Return type

torch.FloatTensor

class mmf.modules.metrics.VQAEvalAIAccuracy[source]¶

Calculate Eval AI VQAAccuracy. Find more information here This is more accurate and similar comparision to Eval AI but is slower compared to vqa_accuracy.

Key: vqa_evalai_accuracy.

calculate(sample_list, model_output, *args, **kwargs)[source]¶

Calculate vqa accuracy and return it back.

Parameters

sample_list (SampleList) – SampleList provided by DataLoader for current iteration
model_output (Dict) – Dict returned by model.

Returns

VQA Accuracy

Return type

torch.FloatTensor