datasets.base_dataset¶

class mmf.datasets.base_dataset.BaseDataset(dataset_name, config, dataset_type='train', *args, **kwargs)[source]¶

Base class for implementing a dataset. Inherits from PyTorch’s Dataset class but adds some custom functionality on top. Processors mentioned in the configuration are automatically initialized for the end user.

Parameters

dataset_name (str) – Name of your dataset to be used a representative in text strings
dataset_type (str) – Type of your dataset. Normally, train|val|test
config (DictConfig) – Configuration for the current dataset

load_item(idx)[source]¶

Implement if you need to separately load the item and cache it.

Parameters: idx (int) – Index of the sample to be loaded.

prepare_batch(batch)[source]¶

Can be possibly overridden in your child class. Not supported w Lightning trainer

Prepare batch for passing to model. Whatever returned from here will be directly passed to model’s forward function. Currently moves the batch to proper device.

Parameters

batch (SampleList) – sample list containing the currently loaded batch

Returns

Returns a sample representing current: batch loaded

Return type

sample_list (SampleList)