• Docs >
  • datasets.base_dataset
Shortcuts

datasets.base_dataset

class mmf.datasets.base_dataset.BaseDataset(dataset_name, config, dataset_type='train', *args, **kwargs)[source]

Base class for implementing a dataset. Inherits from PyTorch’s Dataset class but adds some custom functionality on top. Processors mentioned in the configuration are automatically initialized for the end user.

Parameters
  • dataset_name (str) – Name of your dataset to be used a representative in text strings

  • dataset_type (str) – Type of your dataset. Normally, train|val|test

  • config (DictConfig) – Configuration for the current dataset

load_item(idx)[source]

Implement if you need to separately load the item and cache it.

Parameters

idx (int) – Index of the sample to be loaded.

prepare_batch(batch)[source]

Can be possibly overridden in your child class. Not supported w Lightning trainer

Prepare batch for passing to model. Whatever returned from here will be directly passed to model’s forward function. Currently moves the batch to proper device.

Parameters

batch (SampleList) – sample list containing the currently loaded batch

Returns

Returns a sample representing current

batch loaded

Return type

sample_list (SampleList)